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(4)  Introduction 


We  have  been  developing  CAD  algorithms  in  detection  of  microcalcifications  and  masses  using 
advanced  image  processing  and  computer  vision  techniques.  Our  CAD  algorithms  have  provided  very 
promising  results  in  laboratory  tests.  The  goals  in  this  project  are  to  implement  our  CAD  algorithms  in  a 
fast  workstation,  develop  user  interfeces  for  efficient  operation  of  the  CAD  programs,  and  conduct  a  pilot 
clinical  trial  of  the  CAD  schemes  at  two  mammographic  screening  sites.  Based  on  the  results  of  the  pilot 
clinical  trial,  we  can  evaluate  the  sensitivity  and  specificity  of  the  CAD  algorithms,  analyze  the  effects  of 
the  CAD  schemes  on  mammographic  screening,  identify  potential  problems  in  a  clinical  environment,  and 
develop  methods  to  fiarther  improve  the  CAD  schemes  in  the  future.  We  believe  that  this  is  a  crucial  step 
to  develop  a  clinically  practical  CAD  workstation. 

It  has  been  recogt^d  that  digital  mammography  is  one  of  the  key  research  areas  for 
improvement  in  the  diagnosis  of  breast  cancer.  Two  of  the  major  issues  in  digital  mammography  are  the 
technological  requirements  in  developing  high  resolution  digital  detectors  and  the  transmission  and 
archiving  the  large  amount  of  data.  Data  compression  can  reduce  the  amount  of  data  for  transmission 
and  storage.  However,  there  is  often  a  tradeoff  between  compression  ratio  and  image  fidelity.  Data 
compression  in  mammography  is  especially  difficult  because  of  the  very  subtle  image  details  such  as 
microcalcifications  and  mass  margins  that  need  to  be  preserved.  In  this  project,  we  have  developed  a 
CAD  guided  data  compression  technique  that  preserves  the  original  image  information  by  lossless 
compression  in  potentially  important  regions  on  the  mammograms  indicated  by  the  CAD  programs.  For 
breast  areas  outside  these  regions,  the  most  efficient  lossy  compression  technique  that  does  not  cause 
noticeable  degradation  of  image  details  is  applied.  This  image  compression  method  will  maximize  the 
compression  efficiency  with  a  minimum  loss  of  information. 

With  the  support  of  this  grant  from  the  USAMRMC  Breast  Cancer  Research  Program,  we  have 
developed  a  CAD  workstation  with  a  proper  graphical  user  interface  for  a  pilot  clinical  trial.  CAD 
workstations  have  been  implemented  at  the  University  of  Michigan  and  at  the  Georgetown  University. 
We  have  recruited  about  2,500  patients  whose  mammograms  were  read  with  and  without  CAD  by 
radiologists.  The  effects  of  CAD  reading  have  been  evaluated.  We  have  also  implemented  the  CAD 
guided  data  compression  technique  for  a  data  set  of  mammograms  and  conducted  subjective  image 
quality  ranking  studies  to  compare  observer  performance  on  the  uncompressed  images  with  that  on 
images  compressed  with  the  selected  lossy  technique.  Details  of  these  studies  have  been  described  in 
previous  annual  progress  reports  and  are  summarized  in  this  final  report. 
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(5)  Body 

During  the  non-cost-time-extension  period  of  9/23/00  to  9/22/01,  our  goal  is  to  continue  to 
collect  patient  cases  for  the  pilot  study  of  the  effects  of  CAD  in  mammographic  screening.  The  images 
are  read  by  radiologists  without  and  with  CAD  using  the  CAD  workstations  at  the  University  of 
Michigan  and  the  Georgetown  University: 

In  this  final  report,  we  summarize  the  major  studies  performed  in  the  entire  funding  period 
(9/23/96-9/22/01)  and  the  significant  results  obtained  under  the  support  of  this  grant.  Some  of  the  details 
have  been  reported  in  the  previous  years. 

(a)  CAD  View  workstation 

In  the  previous  reports,  we  have  discussed  the  design  and  operation  of  our  PC-based  CAD 
workstation,  "CADView",  and  its  graphical  user  interface  (GUI)  in  detail.  We  will  review  briefly  the 
operation  of  the  CADView  system  used  in  the  pilot  clinical  study,  as  shown  in  Figure  1 .  The  radiologist 
read  the  original  film  mammograms  on  the  alternator  as  in  their  daily  clinical  practice.  They  will  then 
retrieve  the  patient  4-view  mammogram  to  be  displayed  on  the  CADView  monitor  by  scanning  the 
barcode  of  the  patient  folder.  The  mammograms  displayed  on  the  screen  are  arranged  in  exactly  the  same 
way  as  the  films  mounted  on  the  alternator  to  facilitate  the  radiologist  to  compare  the  corresponding 
locations  marked  on  the  images.  The  display  is  placed  next  to  the  offline  alternator  and  the  radiologist 
can  easily  access  the  keyboard  and  mouse.  The  reading  process  is  shown  in  Figure  2.  The  radiologist  will 
mark  any  potential  masses  on  the  displayed  images  and  record  their  impression  of  the  most  suspicious 
mass  using  the  BI-RADS  lexicon.  They  also  select  the  BI-RADS  action  category  for  the  mass  that  is 
recorded  by  the  CAD  system.  Any  potential  microcalcification  locations  will  then  be  marked  and  the  BI¬ 
RADS  impression  and  action  category  for  the  microcalcifications  are  recorded.  The  computer  then 
displays  the  detected  suspicious  masses  on  the  images.  The  radiologist  will  read  the  original  films  again 
based  on  the  computer  prompts.  The  radiologist  can  change  their  initial  markings  of  masses  on  the 
displayed  images  if  they  are  influenced  by  the  computer  output.  They  can  also  change  the  BI-RADS 
impression  and  the  action  category  for  the  mass.  The  same  procedure  will  also  be  performed  for 
microcalcifications.  The  markings  and  action  categories  of  the  radiologist  before  and  after  CAD  display 
are  both  recorded  in  a  database  file. 

Figure  3  illustrates  an  example  of  the  radiologist's  markings  on  the  displayed  images.  The  double 
circles  marked  the  location  of  the  most  suspicious  mass  in  Figure  3(a)  and  the  location  of  the  most 
suspicious  microcalcification  clusters  in  Figure  3(b).  The  sliders  on  the  right  indicated  the  BI-RADS 
impression  of  the  marked  lesions.  The  right  and  left  breasts  were  recorded  separately.  The  BI-RADS 
action  categories  for  the  lesions  were  also  selected  on  the  sliders. 

Figure  4  illustrates  the  same  example  after  the  CADView  displayed  the  computer  detection 
output.  The  computer  detected  masses  were  marked  by  arrowheads  and  the  computer  detected  clusters 
were  marked  by  dots.  The  radiologist's  original  marks  were  superimposed  on  the  computer  output.  If 
there  were  disagreements,  the  radiologist  could  double-check  the  film  mammograms  on  the  alternator  to 
resolve  the  discrepancy.  If  the  radiologist  found  additional  suspicious  locations,  he/she  would  add  new 
marks  on  the  displayed  images.  If  the  new  locations  were  deemed  more  suspicious  than  the  ones  that 
he/she  marked  before  the  computer  output  was  displayed,  they  could  move  the  double  circles  to  the  new 
locations.  The  radiologist  could  also  change  their  BI-RADS  impression  and  action  categories  on  the 
lesions  by  moving  the  pointers  on  the  sliders. 
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Figure  1 .  The  setup  for  the  CAD  reading  of  off-line  screening  mammograms.  The  radiologist  reads  the 
original  film  mammmograms  on  the  alternator  while  using  the  computer  output  as  a  second 
opinion. 


6 


New  Case 


Mass  Yes 

<L  Abnormal  J> - 


Identification 
of  mass  lesions 


fUdnig  for  most  lfii|H»rtaf4 

t:  V v:’: ■' ::' V'V  : '♦  ■1>; 


ItogalAfS  8»ci(in  can  '  NMd 

.r'iv/'v'  ■■■  ;■  W- ■.•... 


n  0-fiU)S  «w4  8MM«Bei)l 


Microcalcifications  1 

Abnormal 


Identification  of 
microcalcification 
lesions 


Nasatlvt  Banipi  Cel  Need 

Bade  Pdor 

,  ms 


B«ni0fi  lifiorecoleificaions 
R  FHQiil 


Mass 


CAD  ON 

Change 

"nTT^ 


Change  of 
mass  locations 


|-  -'r'  ■'  ■.•■'•/r  •-■■T  ■  ■■ 

-•v,  r ' 

^  NeSfatfm  Benigin  col 

Need 

Prior 

h  '  .  '  '  '  ' 

nms 

I  Rlifng  tor  most  impoitant  moss 

I'p BMtWS 

f? 'Nfjelve;-,;  -firOi' 
fer  8fni«»'oy^  eiie»mrt 

M'Ni^  BanO)  MmI#  S«p*«»  HM 


Microcalcifications  I 

-<1^  Change 

No  I 


Change  of 
microcalcification 
locations 


-  r  ■  '  r>.  •"•  r: , 

Negativa  Benign  caR  Ne^ 

..  Back  Prior 


tor  most  impoftont  cotes 

|[mgM?a:£Sg 


k  6^d 


y  ftnilwB  "' :  Stnot-a  .  .IMablr  a  SwieliiDw.  '. ..  -  HigNi* . 
^1  Beniai 


Figure  2.  Sequence  of  reading  and  collection  of  the  radiologist's  BI-RADS  assessment  of  a 
mammographic  case. 


Figure  4(b).  Radiologist's  assessment  of  microcaicifications  with  CAD  display. 


(b)  Collection  of  screening  mammograms 

To  date,  we  have  collected  over  1700  cases  from  the  University  of  Michigan  (UM)  breast  imaging 
off-line  screening  sites,  and  over  1500  cases  from  the  Georgetown  University  (GU)  Breast  Imaging  clinic. 
In  each  site,  there  were  many  radiologists  involved  in  the  reading  with  the  screening  mammography  cases 
with  CAD. 

(b.l)  University  of  Michigan  cases 

We  have  analyzed  the  first  1665  cases.  We  do  not  have  the  callback  results  and  follow  up 
information  on  the  other  more  recent  cases  yet  because  of  the  time  delay  between  a  decision  to  call  back 
and  the  scheduled  call  back  exam.  The  number  of  callbacks,  biopsies,  and  follow-up  cases  within  the  first 
1665  participating  patients  at  the  UM  are  summarized  in  Table  1.  The  results  are  compared  with  the  GU 
data  and  the  overall  data  later  in  Table  4.  We  can  make  the  following  observations  from  the  UM  cases: 

1 .  For  the  cases  that  the  radiologists  recommended  biopsy,  the  computer  program  detected  88%  (23/26) 
of  the  lesions. 

2.  For  the  cases  that  radiologists  recommended  fine  needle  biopsy,  the  computer  program  detected  71% 
(5/7)  of  the  lesions. 

3.  The  computer  detected  83%  of  the  malignant  cases  (5/6)  found  in  this  patient  group.  One  was  mass 
cases,  one  was  microcalcification  case,  and  the  other  three  were  cases  manifesting  both  mass  and 
microcalcifications.  The  computer  missed  one  malignant  mass. 

4.  The  computer  caused  30  additional  call  backs,  of  which  5  were  recommended  6  month  follow-up, 
indiVating  that  the  computer  foimd  some  areas  of  concern  that  the  radiologists  would  not  have  called 
without  the  computer  output.  The  development  of  the  6-month  follow-up  cases  will  be  followed. 

5.  The  computer  caused  2  additional  biopsies  and  2  fine  needle  aspiration,  all  were  found  to  be  benign. 

6.  The  computer  has  a  detection  sensitivity  of  75%  for  masses,  80%  for  microcalcifications,  and  97%  for 
mixed  mass  and  microcalcification  cases,  slightly  lower  than  om  predicted  performance  in  laboratory 
tests  but  similar  to  the  detection  sensitivity  found  at  the  Georgetown  University  site.  These  results 
confirm  that  the  performance  of  the  CAD  system  is  consistent  in  the  patient  population,  although  the 
two  sites  use  different  digitizers  and  different  mammography  systems. 

7.  The  computer  missed  10  cases  that  were  recommended  for  6  month  short-term  follow  up.  The 
development  of  these  follow-up  cases  will  be  followed. 

8.  The  computer  missed  2  cases  of  fine  needle  aspiration  and  3  biopsy  cases,  one  of  which  was 
malignant. 

9.  The  majority  (41/56)  of  the  false  negative  cases  were  found  to  be  normal  or  benign  after  call  back. 
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Table  1,  The  performance  of  the  CADView  detection  system  and  its  effects  on  radiologists'  reading  on 
the  callback  cases  from  the  first  1665  off-line  screening  cases  at  the  University  of  Michigan. 
The  12  month  follow  up  indicates  a  regular  annual  screening  schedule  and  these  cases  are  thus 
generally  considered  to  be  normal.  FN=false  negative;  FU==follow  up. 


Biopsy 

Fine  needle 
aspiration 

6  month 
FU 

12  month 
FU 

Overall 

Cancer 

Call  Backs  for  Mass 

Radiologist  detection 

13 

6 

39 

132 

190 

2 

Computer  detection 

10 

4 

31 

97 

142 

1 

Sensitivity  of  computer  (%) 

77% 

67% 

79% 

73% 

75% 

50% 

Call  Backs  for  Calcs 

Radiologist  detection 

7 

0 

15 

13 

35 

1 

Computer  detection 

7 

0 

13 

8 

28 

1 

Sensitivity  of  computer  (%) 

100% 

87% 

62% 

80% 

100% 

Call  Backs  for  Mass  and  Calcs 

Radiologist  detection 

6 

1 

10 

12 

29 

3 

Computer  detection 

6 

1 

10 

11 

28 

3 

Sensitivity  of  computer  (%) 

100% 

100% 

100% 

93% 

97% 

100% 

Overall  Call  Backs 

Radiologist  detection 

26 

7 

64 

157 

254 

6 

Computer  detection 

23 

5 

54 

116 

198 

5 

Sensitivity  of  computer  (%) 

88% 

71% 

84% 

74% 

78% 

83% 

Call  Backs  caused  by  CAD 

Mass 

1 

1 

3 

19 

24 

0 

Microcalcifications 

1 

1 

2 

2 

6 

0 

Computer  False  Negatives 

FN  for  Calcs 

0 

0 

2 

5 

7 

0 

FN  for  Mass 

3 

2 

8 

35 

48 

1 

FN  for  Mass  and  Calcs 

0 

0 

0 

1 

1 

0 
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(b.2)  Geoi^etown  University  cases 

At  the  GU  site,  a  total  of  1574  cases  were  digitized  and  processed  by  the  CAD  system. 
However,  only  73 1  cases  (46%)  were  reviewed  in  conjimction  with  the  clinical  reading  by  the  radiologists 
due  to  some  operation  issues  and  mismatch  of  scheduling.  The  number  of  callbacks,  biopsies,  and  follow¬ 
up  cases  within  the  731  patients  read  with  CAD  at  the  GU  are  summarized  in  Table  2.  The  data  are 
compared  with  the  UM  data  and  the  overall  data  in  Table  4. 

1.  For  the  cases  that  the  radiologists  recommended  biopsy,  the  computer  program  detected  93%  (13/14) 
of  the  lesions. 

2.  For  the  cases  that  radiologists  recommended  fine  needle  biopsy,  the  computer  program  detected 
100%  (7/7)  of  the  lesions. 

3.  The  computer  detected  all  three  malignant  cases  (6/6)  found  in  this  patient  group.  Two  were  mass 
cases,  one  was  microcalcification  case,  and  the  other  three  were  cases  manifesting  both  mass  and 
microcalcifications. 

4.  The  computer  caused  4  additional  call  backs,  of  which  1  was  recommended  biopsy  and  foxmd  to  be 
malignant. 

5.  The  computer  has  a  detection  sensitivity  of  71%  for  masses,  81%  for  microcalcifications,  and  91%  fisr 
mixed  mass  and  microcalcification  case,  slightly  lower  than  our  predicted  performance  in  laboratory 
tests  but  similar  to  the  detection  sensitivity  found  at  the  UM  site.  These  resiilts  confirm  that  the 
performance  of  the  CAD  system  is  consistent  in  the  patient  population,  although  the  two  sites  use 
different  digitizers  and  different  mammography  systems. 

6.  The  computer  missed  5  cases  that  were  recommended  for  3  to  6  month  short-term  follow  up.  The 
development  of  these  follow-up  cases  will  be  followed. 

7.  The  computer  missed  1  biopsy  mass  cases,  which  was  found  to  be  benign. 

8.  The  majority  (25/31)  of  the  false  negative  cases  were  found  to  be  normal  or  benign  after  call  back. 
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Table  2.  The  performance  of  the  CADView  detection  system  and  its  effects  on  radiologists'  reading  on 
the  callback  cases  from  the  first  73 1  off-line  screening  cases  at  the  Georgetown  University.  The 
12  month  follow  up  indicates  a  regular  annual  screening  schedule  and  these  cases  are  thus 
generally  considered  to  be  normal.  FN=false  negative;  FU=follow  up. 


Biopsy  Fine  needle 
aspiration 

3  month 
FU 

6  month 
FU 

1 2  month  Overall  Cancer 
FU 

Call  Backs  for  Mass 

Radiologist  detection 

7 

6 

1 

7 

65 

86 

2 

Computer  detection 

6 

6 

0 

5 

44 

61 

2 

Sensitivity  of  computer  (%) 

86% 

100% 

0% 

71% 

68% 

71% 

100% 

Call  Backs  for  Calcs 

Radiologist  detection 

4 

0 

2 

6 

15 

27 

0 

Computer  detection 

4 

0 

2 

4 

12 

22 

1 

Sensitivity  of  computer  (%) 

100% 

100% 

67% 

80% 

81% 

Call  Backs  for  Mass  and  Calcs 

Radiologist  detection 

3 

1 

1 

2 

4 

11 

3 

Computer  detection 

3 

1 

1 

2 

3 

10 

3 

Sensitivity  of  computer  (%) 

100% 

100% 

100% 

100% 

75% 

91% 

100% 

Overall  Call  Backs 

Radiologist  detection 

14 

7 

4 

15 

84 

124 

5 

Computer  detection 

13 

7 

3 

11 

59 

93 

6 

Sensitivity  of  computer  (%) 

93% 

100% 

75% 

73% 

70% 

75% 

120% 

Call  Backs  caused  by  CAD 

Mass 

0 

0 

0 

0 

3 

3 

0 

Microcalcifications 

1 

0 

0 

0 

0 

1 

1 

Computer  False  Negatives 

FN  for  Calcs 

0 

0 

0 

2 

3 

5 

0 

FN  for  Mass 

1 

0 

1 

2 

21 

25 

0 

FN  for  Mass  and  Calcs 

0 

0 

0 

0 

1 

1 

0 
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Table  3  and  Table  4  summarize  the  overall  results  from  the  two  institutions.  Table  5  shows  the 
ethnic  composition  of  the  mammography  patient  populations  at  TJM  and  GU.  The  ethnic  composition 
may  affect  the  cancer  prevalence  rate  in  the  patient  population.  The  overall  performance  of  the 
CADView  system  can  be  seen  in  the  fourth  column  of  Table  4.  The  sensitivities  for  the  detection  of 
masses  (74%)  and  microcalcifications  (81%)  are  slightly  lower,  as  expected,  than  the  sensitivities  in 
laboratory  tests.  However,  the  detection  sensitivity  for  malignant  cases  at  92%  is  higher  than  that  in  our 
laboratory  data  sets.  Since  the  number  of  cancer  cases  is  small,  the  statistical  uncertainty  will  be  large. 
The  most  promising  result  is  that  one  additional  cancer  (detected  cancer  cases  increased  from  11  to  12) 
was  detected  when  the  radiologist  used  CAD.  Although  the  computer  and  the  radiologist  both  missed 
one  cancer,  the  missed  cancers  were  not  the  same  one.  When  they  worked  together,  the  cancer  detection 
sensitivity  was  increased.  This  result  is  consistent  with  that  of  a  prospective  clinical  trial  conducted  by 

Freer  et  al.  1  in  a  community  hospital.  They  found  that  their  commercial  CAD  system  increased  the 
cancer  detection  rate  of  mammographic  screening  by  19.5%  (from  41  to  49)  in  12,860  patients.  Our  pilot 
results  indicate  that  CAD  may  also  be  usefiil  in  academic  institutions,  although  the  gain  in  cancer 
detection  may  not  be  as  high. 

We  estimated  the  change  in  the  call  back  rate,  the  biopsy  rate  of  the  call  back  cases,  and  the 
biopsy  rate  relative  to  the  number  of  screening  cases,  when  radiologists  read  the  mammograms  with  and 
without  the  influence  of  CAD.  The  call  back  rates  without  CAD  were  estimated  from  the  statistics  of  the 
general  off-line  screening  mammography  patients.  The  results  are  tabulated  in  Table  6  fr)r  UM  and  Table 
7  for  GU.  One  interesting  observation  is  that,  at  the  UM,  the  call  back  rate  without  CAD  was  10.4%. 
For  the  study  group  with  CAD,  the  call  back  rate  increased  to  15.3%.  If  the  cases  caused  by  CAD  were 
excluded,  the  call  back  rate  was  stiU  high  at  13.5%.  The  substantial  increase  in  the  call  back  rate  seems  to 
indicate  that  the  radiologists  at  UM  lowered  their  threshold  for  call  back,  either  intentionally  or 
unintentionally,  when  they  worked  with  the  CAD  system  even  if  the  computer  did  not  point  to  additional 
suspicious  locations.  This  may  cause  an  increase  in  their  sensitivity  for  cancer  detection  with  a  tradeoff  in 
increasing  the  call  back  rate.  This  may  be  one  of  the  reasons  that  the  CADView  system  did  not  increase 
the  cancer  detection  rate  at  the  UM  because  the  radiologists  were  already  highly  alert  in  reading  the  study 
cases  without  CAD.  Fortunately,  the  biopsy  rate  did  not  seem  to  increase  substantially  because  many  of 
the  call  back  cases  were  found  to  be  negative  or  benign.  This  can  be  seen  from  the  decrease  in  the 
biopsy-to-callback  ratio  from  19.8%  to  10.2%,  as  shown  in  Table  6. 

It  may  also  be  noted  that  the  call  back  rate  in  the  community  hospital  where  the  prospective  CAD 
study  was  conducted  was  only  6.5%  without  CAD.  This  is  substantially  lower  than  the  call  back  rate 
without  CAD  at  UM  and  GU,  which  is  10.4%  and  over  20%,  respectively.  The  lower  call  back  rate  at 
the  community  hospital  may  reduce  the  sensitivity  of  the  radiologists  when  CAD  was  not  used.  The  gain 
in  the  radiologists’  sensitivity  by  using  the  CAD  system  can  be  expected  to  be  greater  when  the  sensitivity 
of  the  radiologists  without  CAD  is  lower. 
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Table  3.  The  performance  of  the  CAD  View  detection  system  and  its  effects  on  radiologists'  reading  on 
the  callback  cases  from  the  total  of 2396  off-line  screening  cases  at  UM  and  GU.  The  12 
month  follow  up  indicates  a  regular  annual  screening  schedule  and  these  cases  are  thus 
generally  considered  to  be  normal.  FN=false  negative;  FU=follow  up. 


Biopsy 

Fine  needle 
aspiration 

6  month 
FU 

12  month 
FU 

Overall 

Cancer 

Call  Backs  for  Mass 

Radiologist  detection 

20 

12 

47 

197 

276 

4 

Computer  detection 

16 

10 

36 

141 

203 

3 

Sensitivity  of  computer  (%) 

80% 

83% 

77% 

72% 

74% 

75% 

Call  Backs  for  Calcs 

Radiologist  detection 

11 

0 

23 

28 

62 

1 

Computer  detection 

11 

0 

19 

20 

50 

2 

Sensitivity  of  computer  (%) 

100% 

83% 

71% 

81% 

200% 

Call  Backs  for  Mass  and  Calcs 

Radiologist  detection 

9 

2 

13 

16 

40 

6 

Computer  detection 

9 

2 

13 

14 

38 

6 

Sensitivity  of  computer  (%) 

100% 

100% 

100% 

88% 

95% 

100% 

Overall  Call  Backs 

Radiologist  detection 

40 

14 

83 

241 

378 

11 

Computer  detection 

36 

12 

68 

175 

291 

11 

Sensitivity  of  computer  (%) 

90% 

86% 

82% 

73% 

77% 

92%* 

Call  Backs  caused  by  CAD 

Mass 

1 

1 

3 

22 

27 

0 

Microcalcifications 

2 

1 

2 

2 

7 

1 

Computer  False  Negatives 

FN  for  Calcs 

0 

0 

4 

8 

12 

-'I  ** 

FN  for  Mass 

4 

2 

11 

56 

73 

1 

FN  for  Mass  and  Calcs 

0 

0 

0 

2 

2 

0 

Note: 

*The  total  number  of  cancers  in  this  patient  cohort  is  12.  Both  the  computer  and  the  radiologist  missed 
one  case  but  they  were  not  the  same  case.  The  sensitivities  of  the  computer  and  the  radiologists  were 
therefore  both  92% 


**The  "-1"  means  that  the  lesion  was  a  false-negative  of  the  radiologist  without  CAD. 
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Table  4.  Summary  of  the  performance  of  the  CADView  system  and  the  effects  of  CAD  on 
radiologists'  cancer  detection.  Note:  12  month  FU  =  negative  or  benign  finding. 


Cases 

UM 

GU 

Total 

Recommended  biopsy 

88%  (23/26) 

93%  (13/14) 

90%  (36/40) 

Fine  needle  aspiration 

71%  (5/7) 

100%  (7/7) 

86%  (12/14) 

Cancer 

83%  (5/6) 

100%  (6/6) 

92%  (11/12) 

Mass 

75%  (142/190) 

71%  (61/86) 

74%  (203/276) 

Microcalcification 

80%  (28/35) 

81%  (22/27) 

81%  (50/62) 

Mass+Microcalcifications 

97%  (28/29) 

91%  (10/11) 

95%  (38/40) 

Additional  call  backs 
caused  by  CAD 

Total  =  30 

6  month  FU  =  5 

12  month  FU  =  21 

Total  =  4 

6  month  FU  =  0 

12  monthFU  =  3 

Total  =  34 

6  month  FU  =  5 

12  month  FU  =  24 

Additional  biopsies  caused 
by  CAD 

Total  =  2 

Benign  =  2 

Malgnant  =  0 

Total  =  1 

Benign  =  0 

Malgnant  =  1 

Total  =  3 

Benign  =  2 

Malgnant  =  1 

Additional  fine  needle 
aspiration  caused  by  CAD 

2  (Benign) 

0 

2  (Benign) 

"False  negative"  by 
computer 

Biopsy  =  3 

Fine  needle  asp  =  2 

6  month  FU  =  10 

12  month  FU  =  41 

Biopsy  =  1 

Fine  needle  asp  =  0 

3-6  month  FU  =  5 

12  month  FU  =  25 

Biopsy  =  4 

Fine  needle  asp  =  2 

6  month  FU  =  15 

12  month  FU  =  66 

Additional  cancer  found 
by  CAD 

0 

1 

1 

Missed  cancer  by 
computer 

1 

0 

1 

Total  cancer  found  by 
Radiologist  alone 

6 

5 

11 

Total  cancer  found  by 
Radiologist  +  CAD 

6 

6 

12 

16 


Table  5.  Summajy  of  the  ethnic  compositions  of  the  patient  populations  at  the  University  of  Michigan 
(UM)  and  the  Georgetovvn  University  (GU).  The  statistics  are  based  on  the  general 
mammographic  patient  populations,  not  from  the  particular  patient  cohort  in  this  study 


Ethnicity* 

UM 

GU 

American  Indian  or  Alaskan  native 

0.2% 

(American  Indian) 

Asian  or  Pacific  islander  (Asian) 

2.8% 

1.9% 

Black,  not  of  Hispanic  origin 
t  African  American'l 

7.0% 

19.6% 

Hispanic  (Spanish  Surname) 

0.5% 

White,  not  of  Hispanic  origin  (Caucasian) 

83.5% 

78.4% 

Other  or  Unknown 

6.0% 

0.1% 

*The  ethnicity  text  is  the  label  used  by  UM. 
corresponding  label  used  by  GU. 

The  text  in  parentheses  is  the 

Table  6.  The  average  call  back  rate  and  biopsy  rate.  The  "without  CAD"  rates  are  estimated  off-line 
screening  mammograms  at  UM  from  1997-1999.  The  "with  CAD"  rates  are  estimated  from  the 
patient  cohort  in  the  study. 


Call  Back  Rate 

Biopsy  Rate  for 

Biopsy  Rate  for 

Call  Back  patients 

screened  patients 

Without  CAD 

10.4% 

19.8% 

2.1% 

With  CAD 

15.3% 

10.2% 

1.6% 

Table  7.  The  average  call  back  rate  and  biopsy  rate.  The  "without  CAD"  rates  are  estimated  off-line 
screening  mammograms  at  GU.  The  "with  CAD"  rates  are  estimated  from  the  patient  cohort  in 
the  study.  The  biopsy  rate  for  screened  patients  is  not  available  yet. 


Call  Back  Rate 

Biopsy  Rate  for 

Biopsy  Rate  for 

Call  Back  patients 

screened  patients 

Without  CAD 

22-27% 

10% 

- 

With  CAD 

17.0% 

11.3% 

2.1% 
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(c)  Image  compression  of  mammograms  using  a  CAD-guided  wavelet  compression  method 

For  the  second  subproject,  we  collected  a  database  of  mammograms  containing  subtle 
mammographic  lesions  and  digitbied  them  with  a  high  resolution  LUMISYS  laser  scanner.  We 
investigated  the  application  of  different  image  compression  techniques  to  mammograms  and  selected  the 
most  promising  method  for  the  mammograms.  The  images  were  processed  with  the  selected  CAD- 
guided  wavelet  compression  method,  as  shown  in  Figure  5,  and  two  observer  studies  were  conducted  to 
evaluate  the  image  quality  of  the  compressed  mammograms  in  comparison  with  the  uncompressed 
mammograms.  Detailed  discussion  of  the  image  compression  methods  and  the  observer  studies  can  be 
found  in  our  progress  report  last  year.  We  briefly  summarize  the  results  in  the  following. 


Original 

Mammogram 


Computer  Search 


(or  Suspect^ 
Microcafcr^atkwis 


Coordinates 
of  Suspected 
Areas 


Compute  Difference 


tor  Each  Path 


Entropy  Coding 


Entropy  Coding 


Compressed  File 


Compressed  File 


Figure  5:  A  CAD  guided  compression  scheme  based  on  integer  wavelet  decomposition. 


(c.l)  Observer  experiments  and  results 
(1).  The  First  Observer  Study 

An  experienced  breast  radiologist  viewed  a  hundred  sets  of  images  with  fom  different 
compression  modes  and  to  rate  their  subjective  impressions  on  the  relative  quahty  of  the  images.  Each 
set  of  images  is  a  pair  of  original  and  one  of  three  compression  modes.  The  three  compression  modes 
are:  (i)  0.3  bit/pixel  date  wavelet  encoded  in  compressed  file  A  (i.e.,  entire  breast)  with  the  residual  ^ta 
for  lossless  compression  of  suspected  calcifications  in  file  B  (i.e.,  suspicious  locations),  (ii)  0.1  bit/pixel 
wavelet  encoded  in  compressed  file  A  with  the  residual  data  fiir  lossless  compression  of  suspected 
calcifications  in  file  B,  and  (iii)  0.1  bit/pixel  data  wavelet  encoded  in  file  A  only.  Each  set  of 
decompressed  and  original  images  were  randomly  displayed  on  two  monitors  (right  or  left)  as  a  pair.  The 
reader  was  asked  to  rate  image  quality  in  terms  of  calcification  observability,  edge  sharpness,  overall 
image  quality,  and  noise  appearance  for  aU  unages.  If  reader  is  in  fevor  of  one  unage  for  its  specific 
feature,  one  of  the  two  boxes  (left  and  right)  can  be  checked  to  indicate  his/her  preference. 
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The  average  coirq)ression  ratios  and  computed  mean-square-errors  (MSE)  between  the  original 
and  decompression  are  shown  in  Table  8.  We  found  that  the  CAD  guided  compression  method  received 
very  small  MSE  improvement  although  it  used  a  significant  number  of  conqjuter  space  (i.e.,  Bit  rate  = 
total  number  of  bits  used  to  encode  the  data  /  total  number  of  pbcels  in  the  image)  to  preserve  the  fill!  data 
accuracy  of  the  suspected  calcifications.  This  mainly  is  because  that  the  suspected  microcalcifications 
occupy  very  small  area  as  conq)ared  to  the  whole  breast  region. 

Table  8.  Compression  Ratios  and  Mean-Square-Errors  of  the  Three  Compression  Modes  in  the  First 
Observer  Study.  The  conventional  bit  rate  and  area-equalized  (AEQ)  bit  rate  are  defined  as: 


Mode 

A 

B 

C 

Procedure 

0.3  bit/pbcel 
+  lossless  for  spots 

0.1  bit/pbcel 
+  lossless  for  spots 

0.1  bit/pbcel 

Average  Bit  Rate 

0.43  bit/pbcel 

0.23  bit/pbcel 

0.1  bit/pbcel 

Compression  Ratio 

27:1 

52:1 

120:1 

Mean  Square  Error 

50.73 

102.72 

105.63 

(36.81) 

(62.48) 

(63.97) 

From  the  radiologist’s  qualitative  measures  in  comparing  the  original  and  compressed  image  pair, 
we  found  that  no  difference  could  be  observed  between  the  original  and  decompressed  images  at  a  bit 
rate  of  0.43  bit/pixel.  In  fact,  it  is  interesting  that  the  radiologist  seemed  sHghtly  in  favor  of  the 
appearances  of  microcalcifications  and  edges  in  the  conq)ressed  mammograms.  The  radiologist  identified 
20%  of  the  compressed  images  at  0.1  bit  rate  suffering  firom  minor  blurring  artifacts  and  6%  of  the 
compressed  images  possessing  greater  edge  sharpness.  Without  using  lossless  compression  for 
microcalcifications,  the  radiologist  could  identify  20%  of  the  less  sharp  microcalcifications  on  the 
compressed  mammograms  at  0.1  bit  rate.  The  radiologist  also  identified  that  18%  and  6%  of  the 
con^ressed  images  at  0.1  bit  rate  possess  degraded  overall  image  quality  and  higher  image  noise, 
respectively.  Degradation  of  image  quality  in  compressed  images  at  0.1  bit  rate  is  highly  associated  with 
unsharpness  of  microcalcifications  and  edges.  The  image  quality  degradation  at  0.1  bit  rate  is  also 
correlated  with  the  size  of  breast  area.  It  is  estimated  that  if  the  size  of  the  breast  takes  more  than  one 
half  of  the  entire  mammogram,  degradation  in  image  quality  and  edge  unsharpness  would  be  observed  by 
the  radiologist. 

We  also  studied  the  relationship  between  compression  rate  and  breast  area.  For  compression 
rates  higher  than  or  equal  to  O.lbit/pbcel  and  breast  area  less  than  or  equal  to  40%,  no  degradation  can  be 
identified  relative  to  their  original  counterpart  in  overall  image  quality,  overall  noise  pattern,  and  edge 
sharpness.  For  compression  rates  higher  than  or  equal  to  0.  Ibit/pixel  and  breast  area  less  than  or  equal  to 
25%,  no  degradation  can  be  identified  as  inferior  microcalcifications.  Therefore,  we  estimated  that  the 
threshold  of  area-equalized  compression  rate  (AEQ  bit  rate  =  total  number  of  bits  used  to  encode  the 
data  /  total  number  of  pbcels  within  the  breast)  for  the  background  including  edges  is  0.25  bit/pbcel 
(O.lbit/pbcel  divided  by  40%)  and  the  threshold  of  AEQ  compression  rate  for  the  microcalcification  is 
approximately  0.4  bit/pbcel  (0.1  bit/pixel  divided  by  25%). 

(2).  The  Second  Observer  Study 


19 


In  this  experiment,  we  compared  two  different  compression  methods:  (1)  using  an  area-equalized 
compression  rate  at  0.25  bit/pixel  with  preservation  of  microcalcifications  to  compress  and  decompress 
the  mammograms  and  (2)  using  an  area-equalized  compression  rate  at  0.4  bit/pixel  to  compress  and 
decompress  the  mammograms. 

Table  9.  Qualitative  measures  by  comparing  the  paired  images  in  the  Second  Observer  Study. 
(Compression  Methods  1  and  2). 


Micro- 

calciffcations 

Edge 

Sharpness 

Overall 

Image 

Quality 

>  Overall  : 
Noise 
Patterri 

Total 

100 

100 

100 

100 

of  which: 

In  favor  of  the  first  method 

55 

8 

0 

0 

In  favor  of  the  second  method 

0 

0 

0 

0 

No  Difference 

45 

92 

100 

100 

Table  10.  Con^ression  Ratios  and  Mean-Square-Errors  of  the  Two  Compression  Methods  in  the  Second 
Observer  Study. 


.  The  First  Method  .  , 
(0.25  AEQ  bit/piK»l 

4'  Lossless  spots) 

the  Second  Mettiod 
(0.40  AEQ  bitfpixei) 

Bit  Rate  (Bitipixel) 
Mean(SD) 

MSE 

Bit  Rate  (Bit/pixel) 

■'■''■''■'Mbah; 

MSE 

All 

0.149(0.05) 

94 

0.141(0.05) 

55 

Wlicro-calcifications: 

In  favor  of  the  first  method 

0.146(0.06) 

94 

0.135(0.05) 

65 

No  Difference 

0.152(0.04) 

93 

0.148(0.05) 

53 

Edge  Sharpness: 

In  favor  of  the  first  method 

0.195(0.09) 

92 

0.159(0.08) 

49 

No  Difference 

0.145(0.05) 

95 

0.140(0.05) 

55 

The  image  display  and  rating  method  were  similar  to  the  first  experiment.  The  results  indicated 
that  no  image  was  rated  better  than  its  counterpart  by  the  radiologist.  However,  the  radiologist  favored 
microcalcifications  of  55  cases  that  were  compressed  and  decompressed  through  the  first  method  (i.e., 
0.25  AEQ  bit/pixel  with  preservation  of  microcalcifications).  The  radiologist  also  fevored  edge 
characteristics  of  8  cases  that  were  conq)ressed  and  decompressed  through  the  first  method.  No  image 
was  identified  as  a  higher  quality  image  over  its  coimterpart  by  the  radiologist  in  terms  of  overall  image 
quality  and  overall  noise  pattern.  No  image  cortq>ressed  by  the  second  compression  method  (i.e.,  0.4 
AEQ  bit/pixel)  was  in  favor  by  the  radiologists.  Table  9  shows  the  summary  results  of  the  observer 
study.  Table  10  shows  the  bit  rate  used  and  the  average  MSE  of  the  decompression  images  for  each 
category.  Note  that  the  bit  rate  of  the  first  method  includes  the  wavelet  compressed  data  and  the  lossless 
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compressed  data  of  the  suspected  calcification  areas.  Although  the  first  eompression  method  spent  less 
computer  space  to  code  the  overall  breast  area  than  the  second  method  did,  the  first  compression  method 
used  more  conq)uter  space  to  preserve  the  10x10  pixels  area  of  all  suspected  microealfieations,  the 
effective  eompression  bit  rates  were  approximately  the  same  for  both  methods.  We  found  that  the  first 
method  produced  higher  quality  for  clinically  significant  features.  Although  the  overall  MSEs  produced 
by  the  first  compression  method  were  markedly  worse  than  those  produced  by  the  seeond  method,  the 
degradation  was  not  observable  by  the  breast  radiologist,  indicating  that  the  first  compression  method 
generates  error-fi-ee  suspected  calcifications  that  were  appreciable  and  in  favor  by  the  radiologist. 

(c.2)  Conclusions  and  discussion  of  the  compression  studies 

In  this  study,  we  used  conventional  compression  testing  methods  with  and  without  the  CAD 
guidanee  to  evaluate  the  decompressed  images.  We  were  able  to  identify  the  threshold  of  area-equalized 
bit  rate  for  overall  breast  area  and  the  threshold  for  encoding  quality  microcalcifications.  We  used  these 
two  thresholds  to  compress  the  mammograms.  All  four  image-quality  categories  of  all  compression 
images  were  deemed  more  than  adequate.  However,  the  radiologist  fevored  fully  preserved 
microcalciSeations  on  55  out  of  100  images  (55%  of  the  test  database).  This  study  also  showed  that 
neither  edge  nor  overall  image  quality  degradation  could  be  observed  by  the  radiologist  using  area- 
equalized  bit-rate  of  0.25  AEQ  bit/pixel  and  0.4  AEQ  bit/pixel.  Therefore,  CAD  can  be  used  to  guide 
image  processing  method  to  preserve  or  enhance  clinically  significant  features.  Oxir  results  clearly 
indicate  that  the  CAD  guided  compression  method  with  adequate  bit  rate  will  fully  preserve  the  quality  of 
microcalcifications  and  suspected  microcalcifications  without  sacrificing  the  edge  sharpness  and  overall 
image  quality.  The  radiologist  could  not  recognize  any  blocky  artifact  between  lossless  and  lossy 
boimdaries  even  on  magnified  view  with  contrast  adjustable  display. 

(d)  Improvement  of  microcalcification  detection  by  optimization  of  the  neural  network  for 

pattern  recognition 

The  computer  program  that  we  developed  to  automatically  detect  microcalcification  clusters  on 
digitized  mammograms  has  four  stages:  signal-to-noise  ratio  enhancement  of  the  mammogram, 
prescreening  for  suspicious  locations  of  microcalcifications,  rule-based  false  positive  (FP)  reduction, 
pattern  recognition  with  an  artificial  convolution  neural  network  (CNN),  and  regional  clustering  for 
identifying  suspicious  clustered  microcalcifications.  With  the  support  in  part  fi'om  this  grant,  we 
evaluated  the  effectiveness  of  optimal  neural  network  architecture  selection  on  the  performance  this 
microcalcification  detection  CAD  system. 

In  this  study,  we  evaluated  the  effectiveness  of  using  an  automated  optimization  teehnique  in 
selecting  the  optimal  CNN  architecture  in  comparison  with  the  previously  manual  optimization.  Three 
automated  optimization  methods  were  compared:  steepest  gradient  descent  (SD),  genetie  algorithm  (GA) 
and  simulated  annealing  (SA),  for  their  efficiency  in  reaching  the  optimum  in  the  multidimensional 
parameter  space  of  the  CNN  architectures.  It  was  found  that  both  the  GA  and  SA  could  reach  the  global 
optimum  whereas  the  SD  was  often  trapped  in  local  optima.  The  SA  with  the  Boltzmann  annealing 
schedule  was  the  most  efficient  for  this  optimization  problem.  We  conducted  a  study  to  evaluate  the 
improvement  in  the  accuracy  of  the  microcalcification  detection  system  by  the  optimized  CNN  in 

comparison  to  that  with  the  manually  optimized  CNN  ^  (enclosed  in  Appendix).  For  this  evaluation,  we 
used  a  three-stage  approach:  training,  validation,  and  testing.  Three  independent  data  sets  were  used  in 
the  three  stages.  The  test  data  set  for  the  testing  stage  included  472  mammograms  selected  fi'om  the 
University  of  South  Florida  public  digital  mammography  database  and  contained  a  total  of  253  biopsy- 
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proven  malignant  clusters.  Free-response  receiver  operating  characteristic  (FROC)  analysis  was  used  to 
evaluate  the  tradeoff  between  detection  sensitivity  and  the  number  of  FPs  per  image.  At  an  FP  rate  of  0.7 
per  image,  the  microcalcification  detection  program  achieved  a  film-based  sensitivity  of  84.6%  with  the 
optimized  CNN,  in  comparison  with  77.2%  with  the  manually  selected  CNN.  If  clusters  having  images  in 
both  craniocaudal  (CC)  and  mediolateral  oblique  (MLO)  views  were  analyzed  and  a  cluster  was 
considered  to  be  detected  when  it  was  detected  in  one  or  both  views,  at  0.7  FPs/image,  the  sensitivity 
was  93.3%  with  the  optimized  CNN  and  87.0%  with  the  manually  selected  CNN,  This  study  indicated 
that  an  optimized  CNN  can  effectively  reduce  FPs  and  improve  the  detection  accuracy  of  the  computer- 
aided  detection  system. 


(e)  Evalution  of  CAD  mass  detection  a^orithm  with  independent  cases 

In  this  study,  we  analyzed  the  performance  of  our  CAD  algorithm  for  detection  of  breast  masses 

on  independent  clinical  mammograms  ^  (enclosed  in  Appendk).  A  digitized  mammogram  is  processed 
with  an  adaptive  enhancement  filter  foUowed  by  a  local  border  refinement  stage.  Features  are  then 
extracted  fi'om  each  detected  structure  and  used  to  identify  potential  masses.  We  evaluated  the 
performance  of  the  algorithm  on  independent  cases  obtained  fi’om  263  patients  fiom  two  institutions. 
The  CAD  marker  rate  was  estimated  by  applying  the  algorithm  to  503  normal  films.  The  computer 
detected  a  malignant  mass  in  83%  (130/156)  of  the  malignant  cases  at  a  marker  rate  of  1.0  marks  per 
film.  The  detection  accuracy  for  benign  lesions  was  lower  than  that  for  malignant  masses.  FROC 
performance  curves  were  obtained  and  the  tradeoff  between  detection  sensitivity  and  the  number  of  CAD 
marks  was  analyzed.  A  performance  comparison  between  cases  collected  at  the  two  different  institutions 
was  also  included. 

In  an  additional  study,  we  evaluated  the  performance  of  the  mass  detection  program  on  prior 
mammograms  in  which  the  mass  was  not  sent  for  biopsy  in  that  year.  These  patients  were  found  to  have  a 
biopsy-proven  malignant  mass  on  their  mammograms  in  a  fiiture  year.  A  data  set  of  38  patients  with 
mammograms  fiom  1  to  4  years  prior  to  biopsy  was  collected.  The  computer  detected  the  malignant 
mass  in  48%  (13/27)  of  the  prior  cases  at  a  marker  rate  of  1.0  marks  per  film.  This  preliminary  result 
indicates  that  the  mass  detection  program  can  detect  a  substantial  fiaction  of  the  malignant  masses  in  a 
prior  year,  demonstrating  the  potential  that  the  CAD  system  may  be  able  to  alert  radiologists  to 
suspicious  masses  and  lead  to  earlier  breast  cancer  detection. 

(Q  Improvement  of  computerized  mass  detection  on  mammograms;  fusion  of  two-view 
information 

Recent  clinical  studies  have  proved  that  CAD  systems  are  helpfiil  for  improving  lesion  detection 
by  radiologists  in  mammography.  However,  these  systems  would  be  more  usefiil  if  the  FP  rate  is  further 
reduced.  Current  CAD  systems  generally  detect  and  characterize  suspicious  abnormal  structures  in 
individual  mammographic  images.  Clinical  experiences  by  radiologists  indicate  that  screening  with  two 
mammographic  views  improves  the  detection  accuracy  of  abnormalities  in  the  breast.  It  is  expected  that 
fusion  of  information  fiom  different  mammographic  views  will  improve  the  performance  of  CAD  systems. 
With  the  support  in  part  fiom  this  grant,  we  are  developing  a  two-view  matching  method  that  utilizes  the 
geometric  locations,  and  morphological  and  textural  features  to  correlate  objects  detected  in  two  different 

views  using  a  prescreening  program  ^  (enclosed  in  Appendix).  First,  a  geometrical  model  is  used  to 
predict  the  search  region  for  an  object  in  a  second  view  fiom  its  location  in  the  first  view.  The  distance 
between  the  object  and  the  nipple  is  used  to  define  the  search  area.  After  pairing  the  objects  in  two  views, 
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textural  and  morphological  characteristics  of  the  paired  objects  are  merged  and  similarity  measures  are 
defined.  Linear  discriminant  analysis  is  then  employed  to  classify  each  object  pair  as  a  true  or  felse  mass 
pair.  The  resulting  object  correspondence  score  is  combined  with  its  one-view  detection  score  using  a 
fusion  scheme.  The  fusion  information  was  found  to  improve  the  lesion  detectability  and  reduce  the 
number  of  FPs.  In  a  preliminary  study,  we  used  a  data  set  of  169  pairs  of  cranio-caudal  (CC)  and 
mediolateral  oblique  (MLO)  view  mammograms.  For  the  detection  of  malignant  masses  on  current 
mammograms,  the  fiilm-based  detection  sensitivity  was  found  to  improve  from  62%  with  a  one-view 
detection  scheme  to  73%  with  the  new  two-view  scheme,  at  a  false-positive  rate  of  1  FP/image.  The 
corresponding  cased-based  detection  sensitivity  improved  fi:om  77%  to  91%. 

(g)  Optimization  of  wavelet  decomposition  for  image  compression  and  feature  preservation 

As  a  step  to  the  subproject  of  CAD-guided  data  compression  for  mammography,  we  investigated 
different  data  compression  techniques  for  mammograms  and  other  medical  images.  We  have  developed  a 
neural  network  system  that  can  search  for  an  optimal  wavelet  kernel  for  a  specific  image  processing  task. 
In  this  study,  a  linear  convolution  neural  network  was  en^loyed  to  obtain  a  wavelet  that  minimizes  errors 
and  maximizes  compression  efficiency  for  an  image  or  a  defined  image  pattern  such  as  microcalcifications 
on  mammograms.  We  have  used  this  method  to  evaluate  the  performance  of  tap-4  wavelets  on 
mammograms,  conqiuted  tomograms  (CTs),  magnetic  resonance  images  (MRIs),  and  the  Lena  images. 
We  found  that  Daubechies  wavelet  or  those  wavelets  possessing  similar  filtering  characteristics  produces 
a  high  compression  efficiency  with  the  smallest  mean-square-error.  However,  Haar  wavelet  produces  the 
best  results  on  sharp  edges  and  low-noise  smooth  areas.  We  also  fijund  that  a  special  wavelet,  whose 
low-pass  filter  coefficients  are  (0.32252136,  0.85258927,  0.38458542,  -0.14548269),  can  greatly 
preserve  the  microcalcification  features  in  peak  signal-to-noise  ratio,  contrast,  and  figure  of  merit  during 

a  course  of  compression.  The  technical  details  of  this  study  can  be  referred  to  the  paper  by  Lo  et  al. 
(enclosed  in  Appendk). 

(h)  Predictive  decomposition  as  a  framework  in  dyadic  transforms  -  A  unified  theory  for 

wavelet  and  subband  decompositions 

In  this  research,  we  found  that  a  generalized  decomposition  method,  Haar  +  Prediction  + 
Composite  (H+PC),  based  on  Haar  transform  has  been  derived.  This  general  form  can  exactly  describe 
dyadic  transforms.  Another  general  form  Biorthogonal  +  Prediction  +  Composite  (B+PC),  which  is  a 
subset  of  the  doublet  system,  based  on  the  binomial  filter  can  describe  triplet-type  decompositions 
including  whole  point  symmetric  biorthogonal  transformations.  Both  systems  can  be  unified  by  the  delta 
fimction  basis  decomposition  system.  Delta  +  Prediction  +  Composite  (D+PC).  We  also  found  that  these 
three  bases  and  their  expansions  using  predictive  approximation  form  the  dyadic  decomposition  femily. 
Wavelet  and  integer  wavelet  based  decomposition  methods  can  also  be  included  in  this  unified 
fi-amework.  This  fi-amework  clearly  bridges  the  relationship  among  various  types  of  dyadic  transforms. 
To  confirm  this  theory,  we  perform  a  computational  exercise  and  found  that  almost  all  dyadic 
decompositions  can  be  directly  computed  from  their  basis.  A  paper  based  on  this  researeh  has  been 
submitted  to  IEEE  Signal  Processing  for  review.  The  technical  details  of  this  study  can  be  referred  to  the 

paper  by  Lo  et  al.^  (enclosed  in  the  Appendix). 
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(6)  Key  Research  Accomplishments 


•  Developed  the  graphical  user  interface  for  the  CADView  system,  and  implement  the  mass 
detection  and  microcalcification  detection  programs  in  the  system  for  automatic  image 
processing  of  the  digitized  mammogram. 

•  Installed  the  CADView  workstations  at  the  Breast  Imaging  clinics  of  University  of  Michigan 
Health  System  and  at  the  Georgetown  University  Medical  Center,  and  conduct  the  pilot 
clinical  study. 

•  Completed  the  pilot  clinical  study  at  the  UM  and  GU  mammography  screening  sites  with 
2,400  patient  mammograms  read  with  CAD. 

•  Analyzed  the  data  and  found  that  CAD  increased  the  number  of  cancer  detection  from  1 1  to 

12. 

•  Analyzed  the  effects  of  CAD  on  radiologists'  reading  based  on  the  data  collected  from  the 
pilot  clinical  study. 

•  Analyzed  the  performance  of  the  CADView  system  in  the  patient  population,  compare  the 
performances  at  the  two  sites  and  those  in  laboratory  tests. 

•  Continued  improvement  of  the  mass  and  microcalcification  detection  programs,  independent 
of  the  versions  implemented  in  the  CADView  system,  which  were  fixed  throughout  the  pilot 
study. 

•  Investigated  various  image  compression  approaches  for  mammography  and  selected  a  wavelet 
con5)ression  method  for  the  CAD-guided  compression  of  mammograms 

•  Conducted  two  observer  performance  studies  to  compare  microcalcification  detection  on 
mammograms  without  compression,  with  conventional  compression,  and  with  CAD-guided 
compression 

•  Analyzed  the  results  of  the  observer  performance  studies  and  estimated  the  best  compression 
rate  for  the  CAD-guided  compression  method 

•  Published  a  number  of  peer-reviewed  papers  in  the  various  topics  related  to  this  project. 
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(7)  Reportable  Outcomes 


Publications  related  to  the  development  of  the  CAD  system  and  the  evaluation  of  the  effects  of  the  CAD 
system: 


Peer-Reviewed  Journal  Articles 
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classifier  for  computer-aided  diagnosis:  Application  to  classification  of  mahgnant  and  benign 
masses.  Physics  in  Medicine  and  Biology  1998: 43: 2853-2871. 

2.  Chan  HP,  Sahiner  B,  Lam  KL,  Petrick  N,  Helvie  MA,  Goodsitt  MM,  Adler  DD.  Computerized 
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Physics  1998;  25:  2007-2019. 
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1999;  26: 1642-1654. 

4.  Chan  HP,  Sahiner  B,  Helvie  MA,  Petrick  N,  Roubidoux  MA,  Wilson  TE,  Adler  DD,  Paramagul  C, 
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5.  Chan  HP,  Sahiner  B,  Wagner  RF,  Petrick  N.  Classifier  design  for  computer-aided  diagnosis: 
Effects  of  finite  sample  size  on  the  mean  performance  of  classical  and  neural  network  classifiers. 
Medical  Physics  1999: 26:  2654-2668. 

6.  Sanjay-Gopal  S,  Chan  HP,  Wilson  TE,  Helvie  MA,  Petrick  N,  Sahiner  B.  A  regional  registration 
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Physics  1999;  26:  2669-2679. 

7.  Hadjiiski  LM,  Sahiner  B,  Chan  HP,  Petrick  N,  Helvie  MA.  Classification  of  malignant  and  benign 
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(8)  Conclusions 


We  have  completed  the  pilot  clinical  study  of  the  effects  of  CAD  on  radiologists’  reading  of 
screening  mammograms.  We  have  collected  over  2,500  cases  and  analyzed  the  results  of  about  2,400 
cases.  The  overall  sensitivity  of  the  CADView  system  was  foimd  to  be  reasonably  close  to  our  prediction 
based  on  laboratory  tests  and  also  is  consistent  between  the  two  sites.  The  computer  detected  90%  of  the 
lesions  that  were  recommended  for  biopsy  in  both  sites,  and  86%  of  the  fine  needle  biopsy  cases  in  the 
two  sites.  82%  of  the  short-term  follow  up  cases  were  detected  by  the  CAD  system.  Whether  any  of  the 
missed  short-term  follow  up  cases  will  turn  out  to  be  malignant  remains  to  be  followed.  The  CAD  system 
caused  30  additional  callbacks  at  the  UM  site,  of  which  5  were  recommended  short-term  follow  up.  We 
will  track  these  follow-up  cases  to  determine  if  any  of  them  will  turn  out  to  be  malignant.  The  CAD 
system  only  caused  4  additional  callbacks  at  the  GU  site  and  one  of  these  was  found  to  be  malignant. 
The  CAD  system  detected  5  of  the  6  malignant  cases  at  the  UM  site,  whereas  causing  2  additional  benign 
biopsies.  It  detected  all  6  malignant  cases  at  the  GU  site,  one  of  which  was  not  originally  called  by  the 
radiologist.  The  total  number  of  cancers  detected  was  therefore  increased  fi*om  1 1  to  12  in  this  patient 
cohort.  Although  the  niunber  of  cancers  in  this  pilot  study  is  small  and  the  statistical  uncertainty  is  large, 
our  results  indicate  that  the  CAD  system  can  increase  the  sensitivity  of  breast  cancer  detection  for 
screening  mammography  in  academic  centers.  This  information  is  complementary  to  the  findings  of  a 
larger  study  of  the  effects  of  a  commercial  CAD  system  on  screening  in  a  community  hospital,  in  which 
CAD  was  found  to  increase  cancer  detection  substantially  fi'om  41  to  49  in  12,860  patients. 

Since  the  cancer  rate  in  the  screening  population  is  low,  the  number  of  patients  recruited  for  this 
pilot  clinical  study  is  not  sufficient  to  draw  statistically  significant  conclusion  on  the  effects  of  CAD  on 
the  sensitivity  of  mammographic  screening.  However,  this  pilot  study  provides  an  evaluation  of  the 
performance  of  the  CAD  system  in  the  clinical  screening  environment  and,  more  importantly,  an 
assessment  of  the  effects  of  CAD  on  the  callback  rate  of  the  radiologists  for  reading  screening 
mammograms.  At  the  UM  site,  the  call  back  rate  seemed  to  increase  substantially  when  the  radiologists 
read  mammograms  with  the  CAD  system  in  this  study.  However,  the  majority  of  the  call  backs  was  not 
caused  by  the  markers  by  the  computer.  The  radiologists  seemed  to  have  reduced  their  threshold  for  caU 
back,  probably  they  do  not  want  to  miss  lesions  that  may  be  pointed  out  by  the  computer.  Whether  this 
competitive  phenomenon  may  persist  if  the  radiologists  have  to  read  every  screening  mammogram  with 
CAD  routinely  remains  to  be  seen.  However,  this  heightened  alert  level  will  reduce  the  probability  of 
false-negative  diagnosis  by  radiologists  anyway,  serving  partly  the  purpose  of  CAD.  Nevertheless,  the 
increased  call  back  rate  did  not  seem  to  increase  the  biopsy  rate  substantially  because  most  of  call  back 
cases  were  found  to  be  benign  or  negative  upon  workup.  The  results  obtained  fi'om  this  pilot  study  will 
be  important  for  the  design  of  a  large-scale  pivotal  clinical  study  in  the  future  to  further  investigate  these 
issues. 


Two  observer  performance  studies  have  been  conducted  for  the  CAD-guided  image  compression 
project.  It  was  found  that  the  CAD  guided  compression  method  with  adequate  bit  rate  will  fully  preserve 
the  quality  of  microcalcifications  and  suspected  microcalcifications  without  sacrificing  the  edge  sharpness 
and  overjiU  image  quality.  Neither  edge  nor  overall  image  quality  degradation  could  be  observed  by  the 
radiologist  using  area-equalized  bit-rate  of  0.25  bit/pixel  and  0.4  bit/pixel.  The  CAD-guided  compression 
can  therefore  reduce  the  image  transmission  and  storage  requirements  for  digital  mammograms  by  a 
fector  of  30  to  50  without  causing  perceivable  degradation  of  image  quality.  An  effective  image 
compression  method  for  picture  archiving  and  communication  will  facilitate  the  implementation  of 
telemammography  and  digital  mammography.  Both  approaches  are  expected  to  improve  patient  care, 
especially  in  remote  and  rural  areas. 
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RATIONALE  AND  OBJECTIVES:  To  evaluate  the  effectiveness  of  optimal  neural 
network  architecture  selection  on  the  performance  of  a  CAD  system  designed  for  the 
detection  of  microcalcification  clusters  on  digitized  mammograms. 

MATERIALS  AND  METHODS:  We  have  developed  a  computer  program  to  automatically 
detect  microcalcification  clusters  on  digitized  mammograms.  Previously,  we  have  found 
that  a  properly  selected  and  trained  convolution  neural  network  (CNN)  could  reduce  false 
positives  (FPs)  and  therefore  improve  the  accuracy  of  microcalcification  detection.  In  this 
work,  we  evaluated  the  effectiveness  of  the  CNN  optimized  with  an  automated 
optimization  technique  in  improving  the  accuracy  of  the  microcalcification  detection 
program  in  comparison  with  the  previously  manually  selected  CNN.  For  this  evaluation, 
an  independent  test  data  set  was  used,  which  included  472  mammograms  selected  from 
the  University  of  South  Florida  public  database  and  contained  a  total  of  253  biopsy- 
proven  malignant  clusters. 

RESULTS:  At  an  FP  rate  of  0.7  per  image,  the  film-based  sensitivity  was  84.6%  with  the 
optimized  CNN,  in  comparison  with  77.2%  with  the  manually  selected  CNN.  If  clusters 
having  images  in  both  craniocaudal  (CC)  and  mediolateral  oblique  (MLO)  views  were 
analyzed  and  a  cluster  was  considered  to  be  detected  when  it  was  detected  in  one  or  both 
views,  at  0.7  FPs/image,  the  sensitivity  was  93.3%  with  the  optimized  CNN  and  87.0% 
with  the  manually  selected  CNN. 

CONCLUSION:  Classification  of  true  and  false  signals  is  an  important  step  in  the 
microcalcification  detection  program.  An  optimized  CNN  can  effectively  reduce  FPs  and 
improve  the  detection  accuracy  of  the  computer-aided  detection  system. 
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INTRODUCTION 


Although  the  five-year  survival  rate  for  breast  cancer  has  improved  over  the  years 
possibly  due  to  screening  programs,  breast  cancer  remains  to  be  one  of  the  most  common 
cancers  among  women  in  the  Western  world  (1).  When  breast  cancer  is  detected  in  its 
localized  stage,  the  five-year  survival  rate  is  97%  (2).  The  five-year  survival  rate  drops  to 
20%  if  it  has  metastasized.  Screening  mammography  is  currently  best  tool  available  for 
the  early  detection  of  breast  cancer  (3).  Although  the  sensitivity  of  mammography  is 
relatively  high  compared  with  other  breast  imaging  modalities,  the  false  negative 
detection  rate  is  still  as  high  as  15  to  30%.  Double  reading  has  been  shown  to  improve  the 
sensitivity  (4),  however,  it  is  not  cost  effective  in  a  clinical  setting.  Computer-aided 
diagnosis  (CAD)  can  provide  a  second  opinion  and  can  improve  the  detection  accuracy 
significantly  (5-8). 

Several  research  groups  have  developed  CAD  programs  for  the  detection  of 
microcalcifications  using  different  approaches.  Each  approach  employs  a  number  of 
parameters  that  are  usually  determined  during  the  development  of  the  CAD  program.  For 
instance,  the  neighborhood  size  for  the  normalization  of  local  contrast  in  reference  (9), 
and  the  signal-to-noise  ratio  (SNR)  to  determine  the  locally  adaptive  threshold  in 
reference  (10)  are  some  of  the  CAD  program  parameters.  Generally,  these  parameters  are 
chosen  by  experimenting  with  their  values  manually  until  a  satisfactory  performance  is 
achieved.  However,  there  is  no  guarantee  that  these  parameters  will  reach  their  optimum 
values  with  the  trial-and-error  approach. 


In  order  to  set  the  parameters  of  the  CAD  systems  automatically  in  an  optimal  manner, 
several  approaches  have  been  proposed.  Anastasio  et  al.  used  a  genetic  algorithm  (GA) 
based  optimization  method  to  select  the  values  of  10  parameters  in  a  rule-based 
microcalcification  detection  system  (11).  GA  searches  a  parameter  space  using  an  ad  hoc 
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cost  function.  By  performing  training  and  resubstitution  on  a  data  set  with  89  images, 
they  observed  that  the  optimization  increased  the  sensitivity  of  the  CAD  system  from 
80%  to  87%  at  an  FP  rate  of  1.0  per  image.  Many  CAD  systems  are  composed  of  several 
independent  yet  interrelated  parts.  Some  optimization  studies  in  the  CAD  area  involved 
optimizing  one  part  of  the  CAD  system.  For  example,  Sahiner  et  al.  used  GA  and  a 
specially  designed  cost  function  to  select  features  that  could  enhance  the  high  sensitivity 
performance  of  a  classifier  for  classification  of  malignant  and  benign  masses  (12).  Chan 
et  al.  used  GA  to  optimize  features  for  differentiation  of  malignant  and  benign 
microcalcifications  (13).  Leichter  et  al.  used  feature  selection  to  optimize  the 
performance  of  microcalcification  characterization  (14).  Yoshida  et  al.  optimized  the 
wavelet  transform  for  microcalcification  detection  based  on  supervised  learning  (15). 
Tsai  et  al.  used  GA  to  determine  the  optimal  set  of  fuzzy  membership  functions  to 
classify  myocardial  heart  disease  from  ultrasound  images  (16).  Recently,  we  proposed 
and  compared  several  automated  techniques  for  the  selection  of  optimal  neural  network 
architecture  for  CAD  (17-20).  In  this  work,  we  evaluated  the  effect  of  the  CNN 
architecture  selected  by  the  automated  optimization  technique  on  microcalcification 
detection  performance  in  comparison  with  our  previously  manually  selected  architecture. 
The  performances  were  evaluated  using  a  publicly  available,  relatively  large  and 
completely  independent  data  set  of  digitized  mammograms. 

MATERIALS  and  METHODS 

A.  Data  Set 

The  data  set  of  108  mammograms  used  for  the  optimization  and  training  of  the  CNN 
architecture  was  part  of  our  own  database  collected  with  Institutional  Review  Board 
approval  at  the  University  of  Michigan.  For  validation  purposes  we  used  another  data  set 
consisting  of  152  mammograms,  which  was  also  part  of  our  own  database  but  different 
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from  the  set  used  for  training.  The  mammograms  in  our  database  were  randomly  selected 
from  the  files  of  screening  and  diagnostic  patients  who  had  undergone  biopsy  at  the 
University  of  Michigan  so  that  they  included  microcalcifications  with  a  wide  range  of 
characteristics  similar  to  those  encountered  in  clinical  practice.  The  training  and  the 
validation  data  sets  were  digitized  with  a  Lumisys  85  laser  scanner  (Lumisys,  Sunnyvale, 
CA).  For  test  purposes,  an  independent  data  set  was  used.  This  data  set  included  472 
digitized  mammograms,  selected  from  the  University  of  South  Florida  (USF)  digitized 
mammogram  database,  which  is  publicly  available  over  the  internet  (21).  From  all  the 
available  cases  in  this  database,  only  malignant  cases  that  were  digitized  with  the 
Lumisys  200  laser  scanner  were  selected  (volumes:  cancer_01,  cancer_02,  cancer_05, 
cancer_09,  and  cancer_15).  The  mammograms  were  digitized  at  a  pixel  resolution  of  0.05 
mm  X  0.05  mm  with  4096  gray-levels.  We  converted  these  images  to  0.1mm  x  0.1  mm 
resolution  by  averaging  adjacent  2x2  pixels  and  subsampling.  The  detection  was  carried 
out  on  these  0.1-mm  resolution  images.  The  optical  density  (OD)  range  of  the  scanner 
was  0-3.6.  The  digitizer  was  calibrated  so  that  the  gray  values  were  linearly  and  inversely 
proportional  to  the  OD  with  a  slope  of  -0.001  OD/pixel  value.  Details  of  the  case 
collection  method  are  described  in  the  USF  website  (21) 

Types  of  the  microcalcifications  in  the  selected  cases  included  punctate,  amorphous, 
pleomorphic,  round  and  regular,  fine  linear  branching,  round,  dystrophic.  The 
distributions  of  the  calcifications  were  clustered,  linear,  segmental,  and  regional.  The 
lesion  types,  the  assessment,  the  subtlety,  and  the  pathology  were  provided  with  the 
database.  The  cluster  location(s)  was  marked  on  each  image  as  an  overlay  file.  There 
were  a  total  of  272  microcalcification  clusters,  253  of  which  were  biopsy-proven 
malignant.  Figure  1  shows  the  distribution  of  the  assessment  code  for  the  malignant 
clusters  used  in  this  study.  The  assessment  was  provided  in  the  USF  database  and  it 
follows  the  American  College  of  Radiology  breast  imaging  reporting  and  data  system 
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(BIRADS)  categories.  This  distribution  shows  that  most  of  the  clusters  are  in  the 
“actionable  lesion”  category,  which  is  defined  as  BIRADS  assessment  score  of  0,  4,  and 
5.  These  scores  require  patient  callback  for  additional  mammograms  or  biopsy.  Figure  2 
shows  the  breast  density  information.  A  majority  of  the  clusters  comes  from  breasts  with 
densities  of  2  and  3.  Figure  3  shows  the  distribution  of  the  subtlety  rating  for  the 
malignant  clusters.  There  is  no  BIRADS  standard  for  the  subtlety  rating.  In  the  USF 
database,  a  cluster  with  a  subtlety  rating  of  1  is  the  most  obvious  while  a  cluster  with  a 
subtlety  rating  of  5  is  the  subtlest.  It  may  be  concluded  from  this  distribution  that  the 
majority  of  the  clusters  used  in  this  study  could  be  classified  as  subtle. 

B.  Microcalcification  Detection  Program 

We  have  developed  a  computer  program  to  automatically  detect  microcalcification 
clusters  on  digitized  mammograms  (5,  6).  The  program  has  three  major  steps.  The  first 
step  is  preprocessing  in  which  the  breast  boundary  is  automatically  determined  and  the 
breast  region  is  filtered  with  a  band-pass  filter  to  obtain  a  signal-to-noise  (SNR)  enhanced 
image.  The  second  step  is  segmentation.  In  this  step,  potential  microcalcification 
locations  are  determined  using  global  and  locally  adaptive  thresholding  methods.  The 
local  threshold  is  calculated  as  the  product  of  the  local  root-mean-square  (RMS)  noise 
and  an  input  SNR  threshold.  The  microcalcification  size,  maximum  contrast  and  SNR  are 
also  calculated.  In  the  third  step,  the  extracted  signals  are  classified  as  either  a  tme 
microcalcification  (TP)  or  a  false-positive  (FP)  signal.  The  first  stage  is  a  rule-based 
classification  that  uses  the  size,  contrast  and  SNR  information  to  generate  decision  rules. 
The  second-stage  classification  uses  a  trained  convolution  neural  network  (CNN) 
classifier  to  recognize  the  abnormal  patterns.  Finally,  regional  clustering  is  used  to 
identify  clusters  of  signals.  If  a  TP  signal  is  within  a  neighborhood  of  other  TP  signals, 
they  are  combined  to  form  a  cluster.  Previously,  we  have  found  that  the  CNN  could 
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effectively  reduce  the  number  of  FPs  and  therefore  improve  the  accuracy  of  the 
microcalcification  detection  program  (10). 

C.  Convolution  Neural  Network 

The  CNN  is  based  on  the  neocognitron  structure  of  Fukushima  (22).  It  was  previously 
used  for  detection  of  lung  nodules  on  chest  radiographs,  detection  of  microcalcifications 
on  mammograms,  and  classification  of  mass  and  normal  breast  tissue  on  mammograms 
(10,  23,  24).  Figure  4  shows  a  schematic  representation  of  the  CNN  structure.  The  input 
to  the  CNN  is  a  region  of  interest  (ROI)  image,  extracted  for  each  of  the  detected  signals. 
The  nodes  in  the  hidden  layers  are  arranged  in  groups;  each  group  functions  like  a  filter 
kernel.  The  CNN  classifies  the  input  ROI  as  a  TP  or  an  FP.  The  output  node  value  is 
close  to  one  for  true  microcalcifications  and  is  close  to  zero  for  FP  signals.  In  this  work, 
the  CNN  had  one  input  node,  two  hidden  layers  and  one  output  node.  All  node  groups  in 
the  two  hidden  layers  were  fully  connected.  The  images  in  each  layer  were  convolved 
with  the  filter  kernels  to  obtain  the  pixel  values  in  the  images  to  be  transferred  to  the 
following  layer.  There  were  N]  node  groups  in  the  first  layer,  and  N2  node  groups  in  the 
second  hidden  layer.  The  kernel  sizes  of  the  first  group  of  filters  between  the  input  node 
and  the  first  hidden  layer  were  K|X  Ki,  and  those  of  the  second  group  of  filters  between 
the  first  and  second  hidden  layer  were  K2X  K2.  Sigmoidal  activation  functions  were  used 
and  the  CNN  was  trained  using  the  error  back-propagation  rule. 

D.  Neural  Network  Architecture  Selection 

The  CNN  architecture  used  in  our  earlier  studies  was  selected  using  a  manual 
optimization  technique  (10).  We  recently  evaluated  the  use  of  automated  optimization 
methods  for  selecting  an  optimal  CNN  architecture.  Details  of  the  automated  architecture 
selection  study  have  been  described  in  the  literature  (20).  Briefly,  three  automated 
methods,  the  steepest  descent  (SD),  the  simulated  annealing  (SA),  and  the  genetic 
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algorithm  (GA)  were  compared.  Four  main  parameters  of  the  CNN  architecture,  Ni,  N2, 
Ki,  and  K2,  were  considered  for  optimization.  The  area,  Az,  under  the  receiver  operating 
characteristic  (ROC)  curve  was  used  to  design  a  cost  function.  The  SA  experiments  were 
conducted  with  four  different  annealing  schedules.  Three  different  parent  selection 
methods  were  compared  for  the  GA  experiments.  Our  training  data  set  consisted  of 
region-of-interest  (ROI)  images  extracted  from  108  mammograms,  described  above.  The 
locations  of  individual  microcalcifications  in  these  images  were  manually  identified  and 
saved  in  a  truth  file.  After  the  prescreening  steps  of  the  microcalcification  detection 
program  (10),  the  detected  signals  were  labeled  as  TP  or  FP  automatically  by  comparing 
with  the  truth  file.  A  16x  16-pixel  ROI  was  then  extracted  for  each  of  the  detected  signals 
and  these  ROI  images  were  used  for  training  and  testing  the  CNN.  Either  a  true  or  a  false 
microcalcification  was  located  at  the  center  of  the  ROI.  The  microcalcification  detection 
program  detected  more  FP  ROIs  than  TP  ROI  images  at  the  prescreening  stage.  In  order 
to  have  approximately  equal  numbers  of  TP  and  FP  ROIs,  only  a  randomly  selected 
subset  of  FP  ROI  images  was  used.  The  selected  ROIs  were  divided  into  two  separate 
groups.  For  the  first  part  of  the  experiments,  the  first  group,  Gl,  was  used  for  training  the 
CNN  and  the  second  group,  G2,  was  used  for  testing  the  trained  CNN.  For  the  second 
part  of  the  experiment,  the  roles  of  Gl  and  G2  were  switched.  The  first  group,  Gl, 
consisted  of  533  TP  and  553  FP  ROIs.  The  second  group  G2  had  547  microcalcification 
ROIs,  and  570  FP  ROIs.  Therefore,  Gl  contained  a  total  of  1086  ROIs  and  G2  contained 
1117  ROIs.  The  optimal  architecture  (N1-N2-K1-K2)  was  determined  to  be  14-4-5-5  when 
the  architecture  was  trained  with  Gl  and  tested  with  G2,  and  14-10-5-7  when  the  training 
and  the  test  sets  were  switched.  In  our  previous  study  (10),  the  optimal  architecture  was 
determined  to  be  12-8-5-3  using  a  manual  search  technique. 
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RESULTS 


In  addition  to  the  108  mammograms  for  the  training  set,  we  used  a  data  set  of  152 
mammograms  for  the  validation  of  the  selected  CNN  architectures.  In  this  data  set  there 
were  62  mammograms  with  at  least  one  malignant  microcalcification  cluster  and  90 
normal  images  that  were  free  of  clustered  microcalcifications.  The  first  two  steps  of  the 
microcalcification  detection  program  were  run  on  these  images.  The  outputs  of  these 
steps  provided  the  potential  microcalcification  locations.  For  the  last  step,  classification 
was  run  three  times  with  different  CNN  architectures.  In  the  first  run,  the  manually 
optimized  architecture  12-8-5-3  and  its  neural  network  weights  were  used.  In  the  second 
and  third  runs,  the  two  automatically  optimized  architectures,  14-4-5-5  and  14-10-5-7, 
and  their  corresponding  weights  were  used,  respectively.  For  each  run,  the  detection 
outputs  were  calculated  for  three  different  SNR  thresholds  2.8,  2.9,  3.0.  The  sensitivity 
was  calculated  from  the  62  abnormal  mammograms  and  the  FP  rates  were  estimated  from 
the  detection  output  on  the  90  normal  images.  The  outputs  from  these  three  runs  were 
used  to  determine  the  FROC  curves  that  are  compared  in  Figure  5.  The  comparison 
indicates  that  the  first  optimal  architecture  (14-4-5-5)  generally  results  in  much  lower  FP 
rates,  however,  it  also  reduces  the  number  of  TP  clusters  and  thus  reducing  the 
sensitivity.  The  second  optimal  architecture  (14-10-5-7)  presents  a  substantial 
improvement  in  terms  of  both  higher  sensitivity  and  lower  FP  rate.  For  instance,  the 
sensitivity  increases  from  78.7%  to  84.2%  at  0.7  FP  per  image.  Therefore,  these 
validation  results  indicate  that  the  best  CNN  architecture  is  the  second  optimal 
architecture.  We  tested  the  performance  of  this  architecture  on  the  independent  data  set 
that  was  described  in  Section  II. A. 

In  order  to  test  the  performance  of  the  selected  optimal  architecture,  the  detection 
program  was  run  at  seven  SNR  threshold  values  varying  between  2.6  and  3.2  at 
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increments  of  0.1.  Figure  6  shows  the  FROC  curves  of  the  microcalcification  detection 
program  using  both  the  manually  optimized  and  automatically  optimized  CNN 
architectures.  The  FP  rate  was  estimated  by  performing  the  detection  on  the  normal 
mammograms.  The  automatically  optimized  architecture  again  outperformed  the 
manually  optimized  architecture.  At  an  FP  rate  of  0.7  cluster  per  image,  the  film-based 
sensitivity  is  84.6%  with  the  optimized  CNN,  in  comparison  to  77.2%  with  the  manually 
selected  CNN.  Figure  7  shows  the  FROC  curves  for  the  microcalcification  detection 
programs  if  clusters  having  images  in  both  craniocaudal  (CC)  and  mediolateral  oblique 
(MLO)  views  are  analyzed  and  a  cluster  is  considered  to  be  detected  when  it  is  detected 
in  one  or  both  views.  This  ’’case-based”  scoring  has  been  adopted  for  the  evaluation  of 
some  CAD  systems  (8).  The  rationale  is  that  if  the  CAD  system  can  bring  the 
radiologist’s  attention  to  the  lesion  on  one  of  the  views,  it  will  be  unlikely  that  the 
radiologist  will  miss  the  lesion.  For  case-based  scoring  the  sensitivity  at  0.7  FPs/image  is 
93.3%  with  the  automatically  optimized  CNN  and  87.0%  with  the  manually  selected 
CNN. 

DISCUSSION 

Classification  of  true  and  false  signals  is  an  important  step  in  the  microcalcification 
detection  program.  An  optimized  CNN  can  effectively  reduce  FPs  and  improve  the 
detection  accuracy  of  the  CAD  system.  Manually  searching  for  the  optimal  CNN 
architecture  often  results  in  a  local  optimum  because  it  is  difficult  to  explore  adequately  a 
high -dimensional  parameter  space  with  manual  experimentation.  We  have  demonstrated 
previously  that  an  automated  optimization  algorithm  such  as  simulated  annealing  can  find 
the  global  optimum  efficiently  (17-20). 
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Our  optimization  is  currently  limited  to  one  stage,  FP  reduction  with  the  CNN,  of  the 
detection  program.  Our  cost  function  was  based  on  the  of  the  CNN  classifier  for  its 
performance  in  differentiating  the  TP  and  FP  signals.  Ideally,  one  would  prefer  to 
optimize  all  parameters  in  the  detection  program  together.  In  such  a  case,  optimizing  the 
performance  in  terms  of  the  FROC  curve  will  be  necessary.  In  order  to  take  advantage  of 
some  well-established  automated  optimization  methods  such  as  GA  or  SA,  it  is  necessary 
to  define  a  scalar  cost  function.  However,  there  is  no  widely  accepted  form  of  a  scalar 
cost  function  for  the  comparison  of  FROC  curves  obtained  as  a  result  of  different 
detection  methods.  In  an  alternative  form  of  FROC  analysis,  known  as  AFROC  analysis, 
a  scalar  Ai  is  calculated,  which  can  be  considered  a  form  of  cost  function,  but  AFROC 
analysis  requires  a  special  experimental  setting  (25).  Anastasio  et  al.  (1 1)  proposed  an  ad 
hoc  cost  function,  C(f,s),  in  which  they  incorporated  their  preferences  about  their 
sensitivity-specificity  tradeoff  into  a  discrete  grid  of  numbers  on  the  sensitivity- 
specificity  plane;  the  values  in  between  these  grid  values  were  determined  by  means  of 
bilinear  interpolation.  The  fitness  of  each  solution  during  their  GA  evolution  process  was 
assigned  by  evaluation  of  the  cost  function  for  the  solution.  Since  the  cost  function 
optimized  the  FROC  curve  only  at  an  individual  operating  point  that  corresponded  to  a 
sensitivity-specificity  pair,  it  did  not  provide  sufficient  information  to  compare  two 
different  FROC  curves.  Moreover,  the  choice  of  the  preference  values  is  quite  subjective. 
For  our  optimization  study,  the  ROC  methodology,  a  commonly  accepted  form  of 
comparing  overall  classifier  performance,  was  used;  therefore,  the  cost  definition  was 
based  on  the  area  under  the  ROC  curve,  A^.  To  extend  this  definition  for  FROC  curves, 
we  propose  the  following  cost  function 

u 

C^lOOiu-D  -  \s(f)df  (1) 
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where  /  and  u  are  the  lower  and  upper  limits  of  the  FP  range  of  interest,  respectively; /is 
the  FP  per  image  and  s(f)  is  the  sensitivity  at  an  FP  rate  of  /  This  cost  function  will 
compare  two  FROC  curves  in  a  chosen  range  of  FP  rates.  A  similar  function  was 
proposed  by  te  Brake  et  al  to  measure  the  quality  of  a  feature  for  the  discrimination  of 
malignant  masses  from  normal  tissue  in  digital  mammograms  (26).  In  their  definition, 
the  area  under  the  logarithmically  plotted  FROC  curve  between  0.05  and  4.0  FPs  per 
image  was  used  as  a  quality  measure: 

4.0  4.0  , 

\s{f)d\n{f)^  \s{f)yif  (2) 

0.05  0.05  J 

where  A/  is  the  area  under  the  FROC  curve  between  the  chosen  FP  range,  /  and  s(f)  are 
defined  in  Equation  1.  As  shown  in  Figure  8,  the  cost  function  in  Equation  1  calculates 
the  area  above  the  FROC  curve  and  below  the  100%  sensitivity  line.  In  this  cost  function, 
only  the  operating  range  of  the  CAD  system  needs  to  be  defined  in  terms  of  the  FP  range. 
For  a  given  FROC  curve,  the  knowledge  of  s(f)  is  sufficient  for  the  calculation  of  the  total 
cost  function.  Thus,  this  cost  function  is  directly  related  to  the  performance  of  the  CAD 
system  rather  than  subjective  preferences  of  the  user. .  Additionally,  the  cost  definition  in 
Equation  1  is  flexible  in  that  one  can  choose  the  range  of  FPs,  [/,«],  along  the  FROC 
curve  for  which  the  CAD  system  is  to  be  optimized.  Further  studies  needs  to  be 
performed  to  evaluate  the  effectiveness  of  using  the  cost  function  defined  in  Equation  1 
for  the  optimization  of  CAD  systems. 

Of  all  the  available  images  in  the  USF  database,  we  used  only  those  scanned  by  the 
Lumisys  scanner  because  this  was  similar  to  the  scanner  that  we  used  to  acquire  digitized 
mammograms  for  developing  our  CAD  programs  and  setting  its  parameters.  It  is  not 
uncommon  to  see  drastic  performance  decreases  if  different  types  of  scanners  are  used 
for  the  development  and  testing  of  a  CAD  system.  For  instance,  Velthuitzen  et  al. 
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developed  a  microcalcification  detection  program  using  mammogram  images  digitized 
with  a  DBA  ImageClear  R3000  (DBA  Systems  Inc,  Melbourne,  FL)  scanner  and 
achieved  94%  sensitivity  at  an  FP  rate  of  1.23  per  image  on  a  database  of  26  images  (27). 
When  they  scanned  the  same  images  with  a  Lumiscan  50  (Lumisys,  Sunnyvale,  CA) 
scanner,  and  evaluated  the  detection  performance,  the  sensitivity  dropped  to  28%  and  the 
FP  rate  increased  to  2.19  per  image.  In  this  study,  since  we  were  interested  in  evaluating 
the  performance  change  due  to  CNN  architecture  selection,  we  limited  ourselves  to  those 
images  in  the  USF  database  that  were  scanned  by  a  similar  scanner,  thereby  keeping  the 
effects  of  other  factors  on  the  performance  change  to  a  minimum.  The  dependence  of  our 
detection  program  on  data  set  acquired  with  different  film  scanners  will  be  investigated  in 
the  future. 

For  this  optimization  study,  we  followed  a  three-stage  (training-validation-test)  CAD 
development  and  evaluation  methodology.  This  methodology  requires  separate  data  sets 
for  each  stage.  Table  I  summarizes  the  information  about  the  images  in  these  data  sets. 
The  images  in  the  first  two  data  sets  came  from  the  patient  files  at  the  University  of 
Michigan.  However,  these  two  data  sets  were  mutually  exclusive;  they  did  not  share  any 
common  images.  The  data  set  for  training  was  used  to  find  the  parameters  of  the  optimal 
neural  network  architecture  and  neural  network  weights.  The  images  in  the  validation  set 
were  used  to  evaluate  the  performance  of  the  selected  architectures  and  identify  the  best 
performing  architecture  for  an  independent  data  set.  Once  the  architecture  was  selected 
using  the  validation  set,  the  parameters  of  the  detection  program  were  fixed  and  no 
further  changes  were  made  either  to  the  program,  or  to  the  CNN  architecture  and  its 
weights.  Using  this  CAD  program,  microcalcification  detection  was  carried  out  on  a 
completely  independent  and  publicly  available  test  data  set.  The  images  in  this  set  were 
used  only  to  assess  the  performance  of  the  fully  specified  optimal  architecture.  If  only  a 
small  training  set  and  an  independent”  test  set  are  used,  and  the  detection  performance 
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on  the  test  set  is  used  as  a  guide  to  adjust  the  parameters  of  the  detection  program,  there 
is  always  a  bias  due  to  fine  tuning  the  CAD  system  to  this  particular  “test”  data  set  that  is 
essentially  a  validation  set.  The  results  achieved  with  that  test  set  may  not  be 
generalizable  to  other  data  sets.  This  is  especially  an  important  consideration  for  CAD 
system  development.  Before  a  CAD  system  can  be  considered  for  clinical 
implementation,  it  is  advisable  to  follow  this  three-stage  methodology  and  to  evaluate  the 
system  with  an  independent  random  test  set  that  contains  a  large  number  of 
mammograms  with  a  wide  spectrum  of  characteristics.  Otherwise,  the  test  results  may  not 
truly  reflect  the  actual  performance  of  the  CAD  program  in  the  unknown  patient 
population. 

The  range  of  the  SNR  thresholds  (2.8-3.0)  for  the  detection  in  the  validation  set  was 
determined  by  our  previous  experience  with  the  microcalcification  detection  program. 
This  range  has  shown  to  produce  detection  results  within  an  acceptable  FP  range.  The 
range  of  the  SNR  thresholds  for  the  detection  in  the  test  set  was  chosen  wider  than  that 
for  the  validation  set  in  order  to  compare  a  wider  section  of  the  FROC  curve.  A  smaller 
value  of  SNR  threshold  will  generally  result  in  more  potential  signals  to  be  considered  for 
detection.  Thus,  the  sensitivity  is  usually  higher  but  the  number  of  FT*  clusters  also 
increases.  On  the  other  hand,  a  larger  value  of  SNR  threshold  generally  reduces  the 
number  of  FP  clusters  but  this  usually  comes  with  a  decrease  in  the  sensitivity.  Although 
the  SNR  threshold  can  assume  any  positive  value,  very  small  values  may  not  always 
extend  the  FROC  curve  much  further  beyond  its  current  limits  because  at  very  low 
thresholds  the  potential  signals  are  merged  with  the  background  and  the  noisy 
background  also  merges  into  large  patches.  At  very  high  thresholds,  even  obvious 
microcalcifications  may  be  missed  and  the  sensitivity  will  drop  rapidly. 
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The  scoring  of  the  microcalcification  detection  program  was  performed  automatically. 
Figure  9  demonstrates  how  our  automatic  scoring  scheme  was  designed.  There  are  two 
sets  of  inputs  to  the  automatic  scoring  program.  The  first  set  consists  of  the  overlay  files 
where  the  extent  of  each  microcalcification  cluster  is  drawn  by  an  expert  radiologist  as  a 
polygon.  The  second  set  consists  of  outputs  of  the  automated  microcalcification  detection 
program,  which  are  the  smallest  rectangular  bounding  boxes  enclosing  the  detected 
microcalcification  clusters.  The  scoring  program  automatically  calculates  the  intersection 
of  the  areas  enclosed  by  these  rectangles  and  the  polygons.  If  the  ratio  of  the  intersection 
area  to  either  the  rectangle  or  the  polygon  area  is  more  than  40%,  then  the  cluster 
enclosed  by  the  polygon  is  considered  to  be  detected.  If  a  polygon  area  is  detected  by 
more  than  one  rectangular  region,  only  one  TP  is  recorded.  The  sensitivity  for  the  film- 
based  FROC  curve  was  determined  based  on  the  number  of  malignant  clusters  detected 
relative  to  the  total  number  of  malignant  clusters  present  in  the  data  set,  considering 
different  views  of  the  same  cluster  to  be  independent.  For  case-based  scoring,  the 
corresponding  clusters  in  the  two  views  are  used  to  determine  if  the  same  cluster  is 
detected  by  the  CAD  system  in  at  least  one  view.  Detection  of  the  same  cluster  in  one  or 
both  views  will  be  scored  as  one  TP  and  the  sensitivity  is  normalized  to  the  total  number 
of  different  malignant  clusters  in  the  data  set. 

At  present,  there  is  no  established  statistical  test  for  comparing  the  significance  in  the 
differences  between  two  FROC  curves.  Therefore,  we  cannot  provide  a  statistical 
significance  evaluation  on  the  improvement  in  the  FROC  curves  with  the  optimized 
CNN.  However,  since  the  increase  in  the  sensitivity  is  substantial,  from  77.2%  to  84.6% 
at  0.7  FP  per  image,  and  is  consistent  over  the  range  of  FP  studied,  the  effectiveness  of 
the  CNN  is  evident.  Furthermore,  since  the  improvement  is  observed  for  a  relatively 
large  independent  test  set,  and  is  consistent  with  the  performance  observed  with  the 


16 


validation  set,  this  reduces  the  likelihood  that  the  improvement  is  biased  to  the  specific 


data  set. 


CONCLUSION 

We  have  developed  a  CAD  system  for  the  detection  of  microcalcification  clusters  on 
digitized  mammograms.  In  this  study,  we  have  evaluated  the  effectiveness  of  an  optimal 
neural  network  architecture  selected  by  an  automated  simulated  annealing  optimization 
technique  for  improving  the  performance  of  the  CAD  system.  At  an  FP  rate  of  0.7  per 
image,  the  film-based  sensitivity  is  84.6%  with  the  optimized  CNN,  in  comparison  with 
77.2%  with  a  manually  selected  CNN.  If  clusters  having  images  in  both  craniocaudal 
(CC)  and  mediolateral  oblique  (MLO)  views  are  analyzed  and  a  cluster  is  considered  to 
be  detected  when  it  is  detected  in  one  or  both  views,  at  0.7  FPs/image,  the  sensitivity  is 
93.3%  with  the  optimized  CNN  and  87.0%  with  the  manually  selected  CNN.  This  study 
demonstrates  that  classification  of  true  and  false  signals  is  an  important  step  in  the 
microcalcification  detection  program  and  an  optimized  CNN  can  effectively  reduce  FPs 
and  improve  the  detection  accuracy  of  the  CAD  system. 
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FIGURE  CAPTIONS; 


Figure  1.  Distribution  of  the  assessment  rating  for  the  clusters  used  in  our  test  data  set. 
The  assessment  follows  the  ACR  BIRADS  standard  and  was  provided  with  the  USF 
database.  Because  only  biopsy-proven  malignant  clusters  were  included  in  this  test 
set,  the  clusters  have  a  BIRADS  evaluation  of  3  (Probably  benign  finding  -  short 
interval  follow-up  suggested),  4  (Suspicious  abnormality  -  biopsy  should  be 
considered),  and  5  (Highly  suggestive  of  malignancy  -  appropriate  action  should  be 
taken). 

Figure  2.  Breast  density  information  for  the  mammograms  included  in  the  test  data  set. 
The  breast  density  information  follows  the  BI-RADS  standard  and  was  provided  with 
the  USF  database:  1  =  almost  entirely  fat,  2  =  with  scattered  fibroglandular  densities, 
3  =  heterogeneously  dense,  4  =  extremely  dense. 

Figure  3.  Subtlety  ranking  (1  =  obvious,  5  =  subtle)  of  the  253  clusters  provided  with  the 
USF  data  set. 

Figure  4.  Schematic  diagram  of  the  architecture  of  a  convolution  neural  network.  The 
input  to  the  CNN  is  a  region  of  interest  image  extracted  for  each  of  the  detected 
signals.  The  output  is  a  scalar  that  is  the  relative  rating  by  the  CNN  representing  the 
likelihood  that  the  input  ROI  contains  a  true  microcalcification  or  a  false-positive 
signal. 

Figure  5.  Comparison  of  validation  FROC  curves  for  detection  of  clustered 
microcalcifications  using  different  CNN  architectures:  (a)  manually  optimized 
architecture  (12-8-5-3),  (b)  automatically  optimized  architecture  1  (14-4-5-5),  (c) 
automatically  optimized  architecture  2  (14-10-5-7).  The  evaluation  was  performed 
using  the  152-image  validation  data  set  and  three  SNR  thresholds  (2.8, 2.9,  and  3.0). 

Figure  6.  Comparison  of  test  FROC  curves  for  detection  of  clustered  microcalcifications 
with  manually  and  automatically  optimized  CNN  architectures  for  film-based  (single 
view)  scoring.  The  automatically  optimized  architecture  is  14-10-5-7.  The  evaluation 
was  performed  using  the  472-image  test  data  set  and  at  seven  SNR  thresholds 
(between  2.6  and  3.2  varying  at  increments  of  0.1). 

Figure  7.  Comparison  of  test  FROC  curves  for  detection  of  clustered  microcalcifications 
with  manually  and  automatically  optimized  CNN  architectures  for  case-based 
scoring.  In  case-based  scoring,  if  clusters  having  images  in  both  CC  and  MLO  views 
are  analyzed,  a  cluster  is  considered  to  be  detected  when  it  is  detected  in  one  or  both 
views.  The  automatically  optimized  architecture  is  14-10-5-7.  The  evaluation  was 
performed  using  the  472-image  test  data  set  (236  two-view  mammograms)  and  at 
seven  SNR  thresholds  (between  2.6  and  3.2  varying  at  increments  of  0.1). 
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Figure  8:  Definition  of  a  scalar  cost  function  for  optimization  of  CAD  system.  /  and  u  are 
the  lower  and  upper  limits  of  the  range  of  the  number  of  FPs  per  image  on  the  roOC 
curve,  respectively,  /is  the  number  of  FP  per  image,  s(f)  is  the  sensitivity  at  an  FP 
rate  off.  The  cost,  C,  is  determined  as  the  area  above  the  FROC  curve  and  below  the 
100%  sensitivity  line.  This  area  is  shaded  in  the  figure. 

Figure  9.  In  this  schematic  mammogram,  there  are  four  microcalcification  clusters,  (Ci, 
C2,  C3,  C4),  the  extents  of  which  are  drawn  by  radiologists.  The  microcalcification 
detection  program  detects  five  clusters  (Di,  Da,  D3,  D4,  D5).  Di  is  a  TP  detection.  Da 
and  D3  are  FP  detections  because  Da  does  not  intersect  with  any  cluster  and  Da’s 
intersection  with  C3  is  less  than  40%,  which  was  chosen  as  the  threshold  for  detection 
during  training  and  validation  of  the  automatic  scoring  criteria.  D4  and  D5  are 
considered  to  be  detecting  the  same  cluster,  C4.  Therefore,  for  this  example,  the 
number  of  TP’s  is  2  (Cj,  C4),  the  number  of  false-negatives  is  2  (Ca,  C3),  and  the 
number  of  FP’s  is  2  (Da  and  D3). 
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Figure  1.  Distribution  of  the  assessment  rating  for  the  clusters  used  in  our  test  data  set. 
The  assessment  follows  the  ACR  BIRADS  standard  and  was  provided  with  the  USF 
database.  Because  only  biopsy-proven  malignant  clusters  were  included  in  this  test  set, 
the  clusters  have  a  BIRADS  evaluation  of  3  (Probably  benign  finding  -  short  interval 
follow-up  suggested),  4  (Suspicious  abnormality  -  biopsy  should  be  considered),  and  5 
(Highly  suggestive  of  malignancy  -  appropriate  action  should  be  taken). 
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Figure  2.  Breast  density  information  for  the  mammograms  included  in  the  test  data  set. 
The  breast  density  information  follows  the  BI-RADS  standard  and  was  provided  with  the 
USF  database:  1  =  almost  entirely  fat,  2  =  with  scattered  fibroglandular  densities,  3  = 
heterogeneously  dense,  4  =  extremely  dense. 
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SUBTLETY  RANKING 


Figure  3.  Subtlety  ranking  (1  =  obvious,  5  =  subtle)  of  the  253  clusters  provided  with  the 
USF  data  set. 
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Figure  4.  Schematic  diagram  of  the  architecture  of  a  convolution  neural  network.  The 
input  to  the  CNN  is  a  region  of  interest  image  extracted  for  each  of  the  detected  signals. 
The  output  is  a  scalar  that  is  the  relative  rating  by  the  CNN  representing  the  likelihood 
that  the  input  ROI  contains  a  true  microcalcification  or  a  false-positive  signal. 
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Figure  5.  Comparison  of  validation  FROC  curves  for  detection  of  clustered 
microcalcifications  using  different  CNN  architectures:  (a)  manually  optimized 
architecture  (12-8-5-3),  (b)  automatically  optimized  architecture  1  (14-4-5-5),  (c) 

automatically  optimized  architecture  2  (14-10-5-7).  The  evaluation  was  performed  using 
the  152-image  validation  data  set  and  three  SNR  thresholds  (2.8,  2.9,  and  3.0). 
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NO.  OF  FALSE-POSITIVE  CLUSTERS  PER  IMAGE 


Figure  6.  Comparison  of  test  FROC  curves  for  detection  of  clustered  microcalcifications 
with  manually  and  automatically  optimized  CNN  architectures  for  film-based  (single 
view)  scoring.  The  automatically  optimized  architecture  is  14-10-5-7.  The  evaluation  was 
performed  using  the  472-image  test  data  set  and  at  seven  SNR  thresholds  (between  2.6 
and  3.2  varying  at  increments  of  0.1). 
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Figure  7.  Comparison  of  test  FROC  curves  for  detection  of  clustered  microcalcifications 
with  manually  and  automatically  optimized  CNN  architectures  for  case-based  scoring.  In 
case-based  scoring,  if  clusters  having  images  in  both  CC  and  MLO  views  are  analyzed,  a 
cluster  is  considered  to  be  detected  when  it  is  detected  in  one  or  both  views.  The 
automatically  optimized  architecture  is  14-10-5-7.  The  evaluation  was  performed  using 
the  472-image  test  data  set  (236  two-view  mammograms)  and  at  seven  SNR  thresholds 
(between  2.6  and  3.2  varying  at  increments  of  0.1). 
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S(f) 


Figure  8:  Definition  of  a  scalar  cost  function  for  optimization  of  CAD  system.  I  and  u  are 
the  lower  and  upper  limits  of  the  range  of  the  number  of  FPs  per  image  on  the  FROC 
curve,  respectively,  /is  the  number  of  FP  per  image,  s(f)  is  the  sensitivity  at  an  FP  rate  of 
/  The  cost,  C,  is  determined  as  the  area  above  the  FROC  curve  and  below  the  100% 
sensitivity  line.  This  area  is  shaded  in  the  figure. 


28 


Figure  9.  In  this  schematic  mammogram,  there  are  four  microcalcification  clusters,  (Ci, 
C2,  C3,  C4),  the  extents  of  which  are  drawn  by  radiologists.  The  microcalcification 
detection  program  detects  five  clusters  (Di,  D2,  D3,  D4,  D5).  Dj  is  a  TP  detection.  D2  and 
D3  are  FP  detections  because  D2  does  not  intersect  with  any  cluster  and  D3’s  intersection 
with  C3  is  less  than  40%,  which  was  chosen  as  the  threshold  for  detection  during  training 
and  validation  of  the  automatic  scoring  criteria.  D4  and  D5  are  considered  to  be  detecting 
the  same  cluster,  C4.  Therefore,  for  this  example,  the  number  of  TP’s  is  2  (Ci,  C4),  the 
number  of  false-negatives  is  2  (C2,  C3),  and  the  number  of  FP’s  is  2  (D2  and  D3). 
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Data  Set 

Source 

No.  OF  Images 

No.  OF  Malignant 
microcalc.  Clusters 

Training 

University  of 
Michigan 

108 

29 

Validation 

University  of 
Michigan 

152 

76 

Test 

University  of  South 
Florida 

472 

253 

Table  I.  Summary  of  the  data  sets  used  in  the  different  stages  (training,  validation,  test)  of 
this  study.  These  data  sets  are  mutually  exclusive,  i.e.,  there  is  no  overlap  of  images  in 
the  three  data  sets. 
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ABSTRACT 


Purpose:  We  have  developed  a  computer-aided  detection  (CAD)  algorithm  to  detect  breast 
masses  on  digitized  mammograms.  In  this  study,  we  analyzed  the  performance  of  the  algorithm 
with  clinical  cases. 

Materials  and  Methods:  A  digitized  mammogram  is  processed  with  an  adaptive  enhancement 
filter  followed  by  a  local  border  refinement  stage.  Features  are  then  extracted  from  each  detected 
structure  and  used  to  identify  potential  masses.  We  evaluated  the  algorithm’s  performance  on 
independent  cases  obtained  from  263  patients  from  two  institutions.  The  CAD  marker  rate  was 
estimated  by  applying  the  algorithm  to  503  normal  films. 

Results:  The  computer  detected  a  malignant  mass  in  83%  (130/156)  of  the  malignant  cases 
at  a  marker  rate  of  1.0  marks  per  film.  The  detection  accuracy  for  benign  lesions  was  lower  than 
that  for  malignant  masses.  Free-response  receiver  operating  characteristic  (FROC)  performance 
curves  are  obtained  and  the  tradeoff  between  detection  sensitivity  and  the  number  of  CAD  marks 

is  analyzed.  A  performance  comparison  between  cases  collected  at  the  two  different  institutions 
is  also  included. 

Conclusion:  Our  mass  detection  algorithm  has  a  high  sensitivity  for  detection  of  malignant 
masses .  It  may  be  useful  as  a  second  opinion  in  mammographic  interpretation. 

Keywords:  Computer-Aided  Diagnosis,  Preclinical  Evaluation,  Mass  Detection,  Breast  Cancer, 
Mammography 


INTRODUCTION 


Breast  cancer  is  one  of  the  leading  causes  of  death  among  American  women  between  40  to 
55  years  of  age  (1).  Women  who  are  in  a  regular  mammographic  screening  have  a  statistically 
significant  reduction  in  breast  cancer  mortality  compared  to  women  who  do  not  undergo 
screening  (2).  In  addition,  independent  double  reading  by  two  radiologists  increases  the 
sensitivity  of  mammographic  screening  (3).  Results  of  studies  indicate  that  a  4%-15%  increase 
in  detected  cancers  is  possible  with  double  reading  (3-5).  However,  the  higher  cost  and 
increased  workload  may  make  double  reading  by  two  radiologists  impractical  in  a  general 
screening  situation.  Computer-aided  diagnosis  (CAD)  is  a  cost-effective  alternative  to  double 
reading. 

Efforts  to  evaluate  the  usefulness  of  CAD  in  reducing  missed  cancers  are  ongoing.  A 
prospective  study  of  12,860  patients  in  a  community  breast  cancer  center  using  a  commercial 
CAD  system  (ImageChecker  V2.0,  R2  Technologies,  Los  Altos,  CA)  reported  a  cancer  detection 
rate  of  81.6%  (40/49)  with  8  of  the  cancers  initially  detected  by  computer  only.  This 
corresponds  to  a  20%  (41  vs  49)  increase  in  the  number  of  cancers  detected  (6).  These  results 
demonstrate  that  a  CAD  system  can  reduce  the  missed  cancer  rate  when  used  as  a  second  opinion 
even  if  it  does  not  detect  all  cancers. 

The  above  results  do  not  distinguish  between  cancers  presenting  as  masses  alone, 
microcalcification  cluster  alone,  or  a  combination  on  the  mammograms.  In  this  study,  we  focus 
on  the  detection  of  preoperative  mammographic  masses.  We  define  a  preoperative  mass  as  a 
mass  that  is  identified  during  clinical  evaluation  and  either  undergoes  biopsy  based  on  this  exam, 
or  is  followed  and  determined  to  be  benign.  Castellino  et.  al.  found  that  the  latest  version  of  the 
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R2  ImageChecker  achieved  a  mass  detection  sensitivity  of  85.7%  with  0.5  marks  per  image  for 
627  preoperative  mass  cases,  up  from  a  sensitivity  of  74.7%  and  1.0  marks  per  image  in  a 
previous  (V2.0)  release  (7).  A  second  study  evaluating  the  Second  Look  system  (CADx  Medical 
Systems,  Laval,  Quebec,  Canada)  reported  a  mass  detection  sensitivity  of  84%  with  1.1  marks 
per  image  on  a  database  of  149  preoperative  mass  cases  (8). 

The  purpose  of  this  paper  is  to  define  the  performance  of  our  CAD  mass  detection  algorithm 
in  marking  preoperative  masses.  This  paper  differs  from  previous  publications  in  that  the 
performance  is  given  for  both  malignant  and  benign  lesions,  and  on  a  per-film  and  a  per-case 
basis.  The  benign  lesion  performance  is  included  because  the  prevalence  of  CAD  markers  for 
benign  lesions  is  different  from  that  for  normal  structures.  In  addition,  the  detection  of  benign 
lesions  may  affect  the  performance  of  radiologists  using  CAD  differently  from  the  influence  of 
normal  markers.  Since  an  independent  data  set  was  used  for  the  evaluation  (i.e.,  the  algorithm 
parameters  were  fixed  without  any  influence  from  the  test  data  set),  the  results  presented  here 
should  give  an  indication  of  the  potential  benefits  of  our  algorithm  when  used  in  clinical  practice. 
We  also  discuss  the  major  factors  that  lead  to  differences  in  detection  performance  between 
malignant  and  benign  masses,  and  performance  differences  from  cases  collected  from  different 
institutions. 

MATERIALS  AND  METHODS 

Data  Sets 

This  study  involved  the  collection  of  mammographic  films  and  biopsy  information  for  the 
evaluation  of  a  CAD  mass  detection  algorithm.  These  cases  were  collected  with  Institutional 
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Review  Board  (IRB)  approval  and  the  IRB  determined  that,  with  our  protocol  of  maintaining 
patient  confidentiality,  no  patient  consent  was  needed  for  data  collection. 

Training  Cases 

The  clinical  mammograms  used  for  training  the  algorithm  were  selected  from  the  files  of 
patients  who  had  undergone  biopsy.  The  mammograms  were  acquired  with  MinR/MRE 
screen/film  systems  (Eastman  Kodak,  Rochester,  NY)  using  dedicated  processing.  The  selection 
criterion  used  by  the  radiologists  was  that  a  biopsy-proven  mass  existed  on  the  mammogram. 
The  data  set  consisted  of  253  mammograms  from  102  patients,  and  it  included  128  malignant 
and  125  benign  masses.  Sixty-three  of  the  malignant  and  six  of  the  benign  masses  were  judged 
to  be  spiculated  by  an  MQSA  approved  radiologist. 

The  mammograms  were  digitized  with  a  DIS-1000  laser  film  scanner  (Lumisys  Inc, 
Sunnyvale,  CA)  with  a  pixel  size  of  100  pm  and  12  bit  gray  level  resolution.  The  gray  levels 
were  linearly  proportional  to  optical  density  in  the  0.1  to  2.8  optical  density  unit  (O.D.)  range 
and  gradually  fell  off  in  the  2.8  to  3.5  O.D.  range. 

Independent  Test  Cases 

We  analyzed  the  performance  of  the  trained  mass  detection  algorithm  with  independent 
clinical  cases.  These  mammograms  were  not  used  in  the  training  process.  Cases  were  collected 
from  two  different  institutions.  The  first  set  of  cases,  referred  to  as  Group  1,  was  selected  from 
the  files  of  127  patients  who  had  undergone  biopsy  at  our  institution.  The  mammograms  were 
acquired  with  MinR/MRE  screen/film  systems  using  dedicated  processing  in  the  years  prior  to 
1997  and  a  Kodak  2000  screen/film  system  (Eastman  Kodak,  Rochester,  NY)  from  1997  on. 
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Each  case  consisted  of  a  single  craniocaudal  (CC)  and  either  a  mediolateral  oblique  (MLO),  or 
lateral  view  of  the  breast  containing  the  mass.  For  simplicity,  we  will  refer  to  all  views  other 
than  the  CC  view  as  the  MLO  view  in  the  following  discussions  with  the  understanding  that  this 
also  includes  some  lateral  views.  If  both  breasts  of  a  patient  had  a  mass,  each  breast  was 
considered  to  be  an  independent  case.  Using  these  breast-based  definitions,  a  total  of  138  cases 
(with  276  mammograms)  were  available.  Each  case  contained  preoperative  breast  masses  that 
were  identified  by  a  radiologist  during  clinical  evaluation.  The  independent  Group  1 
mammograms  were  digitized  with  a  Lumisys  LS  85  laser  film  scanner  (Lumisys  Inc,  Sunnyvale, 
CA)  that  digitized  the  images  at  50  pm  and  12  bit  gray  level  resolution.  The  gray  levels  were 
calibrated  to  be  linearly  proportional  to  optical  density  in  the  0.1  to  4.0  O.D.  range.  The  images 
were  reduced  to  a  100  pm  pixel  size  by  averaging  2x2  pixel  neighborhoods  before  performing 
mass  detection. 

Clinical  cases  from  the  public  database  available  from  the  University  of  South  Florida  (USF) 
were  also  analyzed  (9).  We  evaluated  an  additional  142  CC/MLO  pairs  from  136  patients 
collected  by  USF.  For  compatibility  with  the  Group  1  database,  we  only  selected  USF  cases 
digitized  with  the  Lumisys  200  laser  film  scanner  (Lumisys  Inc,  Sunnyvale,  CA).  This  scanner 
again  digitized  the  images  at  50  pm  and  12  bit  gray  level  resolution  but  the  gray  levels  were 
calibrated  to  be  linearly  proportional  to  optical  density  in  the  0.1  to  3.6  O.D.  range.  The  142 
USF  cases  will  be  referred  to  as  the  Group  2  cases  in  the  following  discussions. 

Table  1  summarizes  the  Group  1  and  2  test  cases  used  to  evaluate  the  mass  detection 
algorithm.  It  includes  the  number  of  malignant  and  benign  masses  separated  by  whether  they 
were  visible  in  both  views  or  only  in  a  single  view.  Fig.  1  shows  the  distributions  of  lesion 
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subtlety  (lisubtle  to  5;obvious)  for  the  Group  1  and  2  databases  as  ranked  by  a  radiologist 
evaluating  each  individual  mass.  The  rankings  of  all  Group  2  masses  were  retrieved  from  the 
USF  database.  The  mammographic  size  for  the  Group  1  masses  was  measured  by  the  radiologist 
during  initial  case  evaluation.  The  malignant  Group  1  masses  had  a  mean  size,  standard 
deviation  and  median  size  of  15.4  mm,  12.0  mm,  12.0  mm,  respectively.  The  benign  Group  1 
masses  had  a  mean  size,  standard  deviation  and  median  size  of  13.4  mm,  11.8  mm  and  10.0  mm, 
respectively.  Radiologist-measured  mass  sizes  were  not  available  for  the  Group  2  cases  and  we 
found  that  the  annotations  outlining  the  mass  locations  in  these  cases  were  much  larger  than  the 
actual  mammographic  lesion  size.  Therefore,  mass  size  information  is  not  reported  for  the 
Group  2  cases. 

The  IRB  did  not  require  the  collection  of  racial  or  ethnic  information  from  the  subjects  at  our 
institution  so  no  statistics  on  the  racial  or  ethnic  composition  are  available  for  the  Group  1  cases. 
However,  since  the  cases  were  randomly  sampled  from  the  patients  undergoing  mammographic 
exams  in  our  hospital,  the  composition  is  expected  to  be  similar  to  that  of  our  patient  population. 
The  ethnicity  statistics  for  our  mammography  screening  patient  population  in  1998  and  1999  are 
given  in  Table  2.  Table  2  also  includes  the  patient  ethnicity  statistics  for  the  Group  2  cases, 
which  were  collected  in  the  USF  public  database. 

Mass  Detection  Algorithm 

Algorithm  Description 

Our  mass  detection  scheme  uses  adaptive  enhancement,  object-based  border  refinement  and 
feature  classification  to  identify  potential  breast  masses.  The  block  diagram  for  the  scheme  is 
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shown  in  Fig.  2.  The  first  step  is  to  digitize  a  film  mammogram.  The  digitized  mammogram  is 
then  processed  by  an  initial  segmentation  step,  in  which  a  density-weighted  contrast 
enhancement  (DWCE)  filter  is  utilized  for  preprocessing.  The  DWCE  filter  was  developed  to 
accentuate  mammographic  structures  before  edge  detection  by  adaptively  enhancing  the  local 
contrast.  After  DWCE  filtering,  edge  detection  is  employed  to  define  the  borders  of  the 
enhanced  structures.  This  results  in  a  set  of  detected  structures.  Each  of  these  structures  is  then 
processed  by  a  local  refinement  stage.  First,  seed  locations  are  identified  by  finding  all  local 
maxima  within  each  object,  using  an  ultimate  erosion  technique  (10)  and  then  selecting  all 
connected  pixels  with  gray  values  in  the  range  M,  lO.Ol-M,  where  M,.  is  the  gray  level  of  the 

local  maximum.  K-means  clustering  is  then  applied  to  a  25  mm  x  25  mm  background- 
corrected  region  of  interest  (ROI)  (11)  centered  on  each  seed  object  to  refine  the  initial  object 
border  (12).  The  purpose  of  the  local  refinement  stage  is  to  improve  the  accuracy  of  object 
borders  found  by  the  DWCE  segmentation  because  DWCE  segmentation  tends  to  underestimate 
the  size  of  breast  structures.  The  local  refinement  was  also  found  to  be  effective  in  splitting  large 
connected  regions  into  smaller  breast  structures.  The  final  stage  is  to  classify  each  detected 
object  as  a  breast  mass  or  normal  structure  based  on  extracted  morphological  and  texture 
features.  In  order  to  overcome  the  problems  associated  with  the  large  number  of  initial 
structures,  we  perform  the  feature  classification  in  two  stages.  Eleven  morphological  features 
are  initially  used  with  a  threshold  and  a  linear  classifier  to  remove  detected  normal  structures  that 
are  significantly  different  from  breast  masses.  Texture-based  classification  then  follows  this 
morphological  reduction  stage.  Fifteen  global  and  local  multiresolution  texture  features,  based 
on  the  spatial  gray  level  dependence  (SOLD)  matrices  are  used  as  inputs  to  a  linear  discriminant 
classifier,  which  merges  the  input  feature  into  a  single  discriminant  score  for  each  detected 
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object.  Decision  thresholds  based  on  this  score  and  on  the  maximum  number  of  marks  allowed 
per  image  are  then  used  to  identify  potential  breast  masses.  Further  details  on  the  mass  detection 
algorithm  can  be  found  in  the  literature  (13-16). 

Algorithm  Training 

The  computer  program  was  trained  using  the  entire  training  data  set  of  253  mammograms. 
This  included  adjusting  the  filters,  clustering,  selected  features  and  classification  thresholds. 
Once  training  was  completed  the  parameters  and  all  thresholds  were  fixed  for  testing.  The 
training  data  set  was  then  resubstituted  into  the  algorithm  and  was  found  to  have  a  film-based 
(i.e.,  each  mass  on  each  film  was  considered  as  an  independent  sample)  training  sensitivity  of 
81%  (85%  for  malignant  masses).  The  mass  detection  algorithm  produced  2.9  marks  per  film  on 
average  for  the  training  cases.  It  is  important  to  note  that  the  detection  classifiers  considered 
only  classification  between  breast  masses  and  normal  tissue,  not  between  malignant  and  benign 
masses.  Therefore,  no  distinction  was  made  between  malignant  and  benign  masses  in  the 
training  process. 

Definition  of  TP  and  FP  Markers 

For  the  Group  1  cases,  the  smallest  bounding  box  containing  the  entire  mass  identified  by  a 
radiologist  was  used  as  the  truth.  For  Group  2,  we  used  a  bounding  box  around  the  annotated 
region  provided  with  each  image.  Our  definition  of  a  TP  was  based  on  the  percentage  of  overlap 
between  the  bounding  box  of  an  identified  structure  and  the  bounding  box  of  the  true  mass. 
Based  on  the  training  set,  we  chose  an  overlap  threshold  of  25%.  This  value  corresponds  to  the 
minimum  overlap  between  the  bounding  box  of  a  detected  object  and  the  bounding  box  of  a  mass 
in  order  for  the  object  to  be  considered  as  a  TP  detection.  All  detected  objects  that  did  not  meet 
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this  criterion  were  considered  as  FPs.  The  25%  threshold  was  selected  because  it  was  found  to 
match  well  with  TPs  identified  visually.  The  detected  objects  were  first  labeled  automatically  by 
the  computer  using  this  criterion.  All  of  the  TPs  were  then  visually  reviewed  to  make  sure  that 
the  program  highlighted  the  true  lesion  and  not  a  neighboring  structure.  Marks  that  were  found 
to  match  neighboring  structures  were  changed  to  FPs. 

The  number  of  false  positive  (FP)  marks  produced  by  the  algorithm  was  determined  by 
counting  the  markings  produced  in  normal  cases.  We  used  lesion-free  films  of  the  breast 
contralateral  to  the  breast  containing  an  abnormality  as  normal  cases.  Since  some  cases 
contained  lesions  in  both  breasts,  we  have  fewer  normal  films  than  abnormal  films.  We  used  a 
total  of  251  normal  films  from  Group  1  and  252  normal  films  from  Group  2  to  define  the  marker 
rate.  The  TPF,  calculated  from  the  abnormal  cases,  and  the  average  number  of  marks  per  image, 
calculated  from  the  normal  cases,  were  defined  for  a  fixed  set  of  thresholds  at  the  final  texture 
classification  stage.  The  TPF  and  the  average  number  of  marks  per  image  as  the  threshold  varied 
were  then  used  to  plot  the  FROG  performance  curves  for  malignant  and  benign  masses  in  the 
different  data  sets. 

RESULTS 

Test  performance  results  are  presented  on  a  per-film  and  per-case  basis.  In  the  former,  the 
CC  and  MLO  views  are  considered  independently  so  that  a  lesion  visible  in  the  CC  view  is 
considered  as  a  TP  and  the  same  lesion  in  the  MLO  view  is  a  second  TP.  In  the  latter,  a  mass  is 
considered  detected  if  it  is  detected  on  either  the  CC  or  the  MLO  view.  The  latter  evaluation 
takes  into  consideration  that,  in  clinical  practice,  once  the  computer  alerts  the  radiologist  to  a 
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cancer  in  one  view,  it  is  unlikely  that  the  radiologist  will  miss  the  cancer.  The  per-case  approach 
is  often  used  by  other  researchers  in  reporting  their  CAD  performance  (5,  8, 17).  Results  are  also 
presented  for  two  different  TP  scoring  methods.  The  individual  scoring  method  considers  each 
mass  in  a  film  or  case  as  a  different  TP.  The  grouped  scoring  method  considers  all  malignant 
masses  in  a  film  or  case  as  a  single  TP  (5).  The  rationale  for  group  scoring  is  that  a  radiologist 
may  not  need  to  be  alerted  to  all  malignant  lesions  in  a  film  or  case  before  taking  action. 
Therefore,  multiple  detections  in  a  film  or  case  may  not  significantly  enhance  the  power  of  CAD. 

The  FROC  curves  for  the  mass  detection  with  the  individual  data  sets  are  shown  in  Figs.  3-5. 
Figs.  3  and  4  contain  the  FROC  performance  curves  for  Group  1  and  2  based  on  individual  mass 
scoring.  The  FROC  performance  of  the  algorithm  for  malignant  masses  based  on  grouped  mass 
scoring  is  shown  in  Fig.  5.  We  also  analyzed  the  sensitivity  achieved  by  the  mass  detection 
algorithm  at  three  fixed  normal  marker  rates.  These  marker  rates  were  selected  because  they 
represent  potential  clinical  implementations  for  a  CAD  algorithm  based  on  previously  published 
studies  (7,  8).  The  results  at  these  fixed  marker  levels  are  summarized  in  Table  3. 

DISCUSSION 

The  detection  performance  curves  shown  in  Figs.  3-5  clearly  indicate  that  our  mass  detection 
algorithm  is  effective  in  detecting  breast  masses  and  that  the  detection  performance  for 
malignant  masses  is  better  than  that  for  benign  masses.  Since  these  results  were  based  on  a  large 
independent  test  set  from  two  institutions,  and  the  algorithm  parameters  were  not  adjusted  based 
on  any  characteristics  of  the  test  set,  the  performance  results  estimated  in  this  study  should  be 
close  to  the  true  performance  in  the  patient  population.  The  malignant  mass  detection  fractions 
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of  77%,  83%  and  87%  at  marker  rates  of  0.5,  1.0  and  1.5,  respectively,  are  currently  the  best 
estimates  for  the  clinical  performance  of  our  mass  detection  algorithm.  These  performance 
values  compare  quite  well  with  the  published  performance  results  from  the  commercial  CAD 
vendors  (85.7%  TPF  at  0.5  marks/image  by  R2  and  84%  TPF  at  1.1  marks/image  by  CADx  (7, 
8)  as  well  as  with  other  algorithms  currently  being  development  in  research  laboratories.  This 
indicates  that  our  mass  detection  could  be  beneficial  to  radiologists  as  a  second  opinion. 

We  first  compare  the  performance  of  the  algorithm  on  malignant  and  benign  lesions  in  each 
database.  From  Table  3  and  Fig.  3,  we  see  a  somewhat  consistent  difference  in  the  TPFs 
between  the  malignant  and  benign  masses  in  Group  1.  The  per-case  difference  is  in  the  range  of 
9%  to  12%  from  3.0  to  about  0.25  marks/image.  Per-film  results  for  Group  1  follow  the  same 
trend.  The  difference  between  the  malignant  and  benign  masses  is  larger  in  the  USF  database 
(see  Table  3  and  Fig.  4).  Here  the  per-case  difference  starts  at  about  12%  at  3.0  marks/image, 
increases  to  19%  at  1.5  marks/image  and  then  to  34%  at  1.0  mark/image.  The  per-film 
performance  again  follows  a  similar  trend  as  shown  in  Fig  4.  It  is  clear  that  the  performance  on 
the  Group  2  benign  cases  is  much  lower  than  that  for  the  Group  1  benign  cases.  However,  the 
performance  difference  between  the  Group  1  and  Group  2  malignant  masses  is  small. 

One  possible  cause  for  the  performance  differences  between  benign  and  malignant  masses  is 
a  difference  in  lesion  subtlety.  From  Fig.  1,  we  observe  a  small  difference  in  rated  subtlety 
between  the  benign  and  malignant  masses  in  the  Group  1  database,  with  the  malignant  masses 
being  slightly  more  obvious  than  the  benign  masses.  This  same  trend  holds  for  the  Group  2 
masses.  However,  it  should  be  noted  that  the  subtlety  distributions  between  Group  1  and  Group 
2  differ  considerably  as  will  be  discussed  below.  The  observed  difference  in  subtlety  between 
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benign  and  malignant  masses  for  both  Groups  1  and  2  is  not  particularly  large  so  a  subtlety 
difference  does  not  seem  to  fully  explain  the  large  disparities  observed  in  Figs.  3  and  4.  Another 
factor  that  likely  contributes  to  the  observed  difference  is  that  malignant  masses  are  more  likely 
to  be  spiculated  than  benign  masses  and  our  algorithm’s  performance  on  spiculated  masses  is 
superior  to  its  performance  on  non-spiculated  masses.  In  the  Group  1  database,  34%  (49/146)  of 
the  malignant  and  5%  (8/159)  of  the  benign  masses  were  spiculated.  There  were  33%  (65/197) 
and  0%  (0/132)  spiculated  masses  in  the  Group  2  malignant  and  benign  cases,  respectively.  In 
our  training  set,  49%  (63/128)  of  the  malignant  and  6%  (8/125)  of  the  benign  lesions  were 
judged  as  spiculated  by  radiologists.  A  comparison  between  spiculated  and  non-spiculated  mass 
performances  is  shown  in  Fig.  6.  The  spiculated  benign  mass  curve  is  not  included  because  of 
the  small  number  of  lesions  in  this  category.  It  is  clear  from  Fig.  6  that  the  algorithm  is  better 
suited  to  detect  spiculated  masses,  especially  at  the  lower  marker  rates,  although  no  special 
efforts  was  made  to  train  the  algorithm  to  detect  spiculated  masses.  We  surmise  that  the  texture 
analysis  acquired  a  higher  sensitivity  to  spiculated  masses  during  the  training  process  because  of 
the  relatively  large  fraction  of  spiculated  lesions  in  the  training  set.  Even  though  the  detection 
algorithm  had  a  higher  sensitivity  in  detecting  spiculated  masses,  the  large  number  of  non- 
spiculated  masses  in  the  training  set  (182/253)  still  trained  the  algorithm  to  be  sensitive  to  non- 
spiculated  malignant  masses.  The  sizable  difference  between  the  malignant  and  benign  non- 
spiculated  mass  curves  suggesting  that  some  additional,  yet  undetermined,  factors  may  also  be 
contributing  to  the  observed  performance  difference  between  malignant  and  benign  masses. 

We  also  observe  performance  differences  between  masses  in  Groups  1  and  2.  The  malignant 
mass  detection  performances  are  quite  similar  as  shown  in  Fig.  5,  but  the  detection  of  benign 
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lesions  differs  considerably  between  the  groups.  One  potential  factor  is  that  94%  (147/157)  of 
the  benign  masses  in  the  Group  1  database  underwent  biopsy.  This  high  rate  of  benign  biopsy 
suggests  that  the  Group  1  masses  were  judged  by  the  radiologist  to  be  similar  enough  too 
malignant  masses  to  warrant  biopsy  (i.e.,  the  vast  majority  of  the  lesions  were  ACR  BI-RADS 
category  4  and  5).  We  therefore  expect  the  detection  performance  of  these  benign  masses  to  be 
somewhat  similar  to  that  of  the  malignant  masses  for  Group  1.  The  number  of  benign  biopsies 
was  not  available  for  the  Group  2  database,  but  it  is  likely  that  a  smaller  fraction  of  the  benign 
lesions  underwent  biopsy  leading  to  a  larger  fraction  of  ACR  Bl-RADS  category  2  or  3  lesions. 
If  this  is  true,  then  the  Group  2  benign  masses  would  not  match  the  characteristics  of  our  training 
set  as  well  and  may  therefore  be  more  difficult  to  detect.  Another  factor  that  may  have 
contributed  to  this  performance  difference  is  a  difference  in  the  optical  density  ranges  of  the 
digitizers  used  to  acquire  the  cases  at  each  institution.  The  O.D.  ranges  were  0-3.5,  0-4.0  and  0- 
3.6  for  the  Lumisys  digitizers  used  to  digitize  the  training.  Group  1,  and  Group  2  mammograms, 
respectively.  The  smaller  O.D.  range  of  the  digitizer  used  to  digitize  the  Group  2  mammograms 
may  have  caused  a  decrease  in  the  detection  performance  for  subtle  low-density  lesions 
compared  with  the  Group  1  performance  in  similar  cases.  However,  the  Group  2  digitizer  has  an 
advantage  in  many  of  the  cases  because  it  better  matches  the  O.D.  range  of  the  digitizer  used  to 
acquire  the  training  set.  Because  of  the  presence  of  other  factors  such  as  case  variability,  it  is 
difficult  to  differentiate  the  relative  importance  of  these  competing  effects  on  the  algorithm’s 
performance. 

Comparing  the  subtlety  between  the  Group  1  and  Group  2  databases,  we  observed  a  large 
disparity  in  the  radiologists’  rankings.  From  the  subtlety  histograms  in  Fig.  1,  one  may  conclude 
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that  the  Group  2  cases  are  much  easier  than  the  Group  1  cases  for  both  malignant  and  benign 
masses.  However,  this  does  not  agree  with  our  detection  results.  The  detection  performance  for 
the  Group  1  benign  cases  was  much  better  than  that  for  the  Group  2  benign  cases  even  though 
the  Group  1  lesions  were  ranked  as  more  subtle.  The  much  more  “obvious”  malignant  masses  in 
the  Group  2  database  resulted  in  only  a  small  l%-2%  gain  in  the  detection  performance  when 
compared  with  the  Group  1  malignant  cases  (Table  3).  Likewise,  visual  comparison  of  the  cases 
did  not  reveal  such  a  large  difference  between  the  databases.  The  Group  2  subtlety  distribution 
does  not  match  well  with  what  is  expected  in  clinical  practice  because  it  is  highly  skewed 
towards  obvious.  One  would  expect  that  a  randomly  drawn  sample  from  the  patient  population 
would  follow  a  distribution  much  more  similar  to  the  Group  1  histogram.  Therefore,  the  subtlety 
difference  is  most  likely  caused  by  a  difference  in  the  subjective  criteria  used  to  define  lesion 
subtlety  instead  of  a  true  difference  in  subtlety  between  the  cases.  It  is  likely  that  the  individual 
radiologists  used  different  scales  at  the  different  institutions.  The  radiologist  reading  cases  from 
institution  1  appeared  to  have  spread  their  subtlety  ratings  across  the  multiple  categories  while 
the  radiologists  at  institution  2  seemed  to  have  used  basically  a  binary  decision  of  visible  or  not 
visible.  The  results  suggest  that  caution  must  be  taken  when  comparing  detection  results  using 
different  databases.  Even  if  subtlety  ratings  are  available,  the  rating  criteria  may  be  subjected  to 
large  inter-  and  intra-observer  variations..  This  is  especially  true  if  the  databases  are  collected  by 
different  institutions.  Comparisons  between  lesions  rated  at  a  single  institution  using  a  consistent 
rating  criterion  (e.g.,  comparing  malignant  and  benign  lesions  from  the  same  data  set)  are  much 
less  problematic. 

The  results  show  that  our  automated  mass  detection  algorithm  is  capable  of  detecting 
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malignant  masses  in  mammograms  with  a  low  FP  marker  rate,  suggesting  that  this  CAD 
algorithm  may  be  useful  as  a  second  reader  in  the  clinical  interpretation  of  mammograms. 
Further  studies  are  underway  to  both  improve  the  detection  performance  and  reduce  the  marker 
rate  of  the  algorithm  with  single-view  information.  Studies  are  also  underway  to  determine  how 
well  the  mass  detection  performs  on  prior  mammograms  in  which  the  lesion  was  not  sent  for 
biopsy.  Good  performance  in  prior  cases  may  lead  to  earlier  cancer  detection.  We  are  also 
developing  a  new  technique  that  will  incorporate  information  from  different  mammographic 
views  of  the  same  breast  (18,  19).  Our  preliminary  results  indicate  that  two- view  information 
fusion  will  improve  sensitivity  and  reduce  FPs  in  our  detection  algorithm.  Studies  will  also  be 
conducted  to  determine  if  our  CAD  algorithm  aids  radiologists  in  detecting  breast  cancer  earlier 
and  if  it  affects  their  recall  rate. 
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TABLES 


Table  1 

Summary  of  the  cases,  patients  and  masses  in  the  Group  1  and  Group  2  databases. 

Abnormal 


Database 

Total 

Malignant 

Benign 

Normal 

Films 

Patients 

One 

View 

Masses 

Two 

Views 

Masses* 

One 

View 

Masses 

Two 

View 

Masses* 

Films 

Patients 

Individual 

Masses* 

Group  1 

276 

127 

2 

72 

3 

78 

251 

93 

Group  2 

284 

136 

5 

96 

6 

63 

252 

128 

Grouped 

Masses* 

Group  1 

128 

64 

- 

64 

- 

- 

251 

93 

Group  2 

184 

92 

- 

92 

- 

- 

252 

128 

*One  view  masses  correspond  to  the  masses  that  are  visible  in  only  one  mammographic  view  in 
the  pair. 

^Two  view  masses  correspond  to  masses  that  are  visible  in  both  mammographic  views  in  the 
pair. 

*The  individual  masses  category  considers  each  mass  in  a  film  or  case  as  a  TP  during  scoring. 
The  number  of  abnormal  films  and  patients  include  cases  with  both  malignant  and/or  benign 
masses. 

*The  grouped  masses  category  considers  all  malignant  masses  for  a  film  or  case  together  as  one 
TP  during  scoring.  The  number  of  abnormal  films  and  patients  include  only  cases  with  malignant 
masses. 


19 


Table  2 


Summary  of  the  ethnic  composition  of  the  Group  1  and  2  patient 
populations. _ 


Ethnicity* 

Percentage  of  Population 
Group  H  Group  2^ 

American  Indian  or  Alaskan  native 
(American  Indian) 

0.2 

O.I 

Asian  or  Pacific  islander  (Asian) 

2.8 

0.2 

Black,  not  of  Hispanic  origin 

7.0 

20.4 

Hispanic  (Spanish  Surname) 

0.5 

1.8 

White,  not  of  Hispanic  origin  (White) 

83.5 

77.0 

Other/Unknown 

6.0 

0.4 

Note.  — The  sum  of  all  categories  may  not  equal  100%  because  of 
rounding  errors. 

*The  ethnicity  text  is  the  actual  label  used  by  Institution  1  in  the  Group  1 
description.  The  text  in  parentheses  is  the  corresponding  label  used  by 
Institution  2  in  the  Group  2  description,  if  it  differed. 

^Percentages  without  fractions  are  given  because  ethnicity  is  based  on  a 
larger  mammographic  patient  population,  not  from  the  particular  cases 
used  in  this  study. _ 
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Table  3 

Summary  of  the  per-case  mass  detection  performance  at  marker  rates  of  0.5, 1.0,  and  1.5 


marKs  per  image. 

TPF* 

Data  Set 

0.5  Marks 

1.0  Marks 

1.5  Marks 

Individual  Malignant’^ 
Group  1 

Group  2 

Combined 

55/74  (74) 
76/101  (75) 
131/175  (75) 

59/74  (80) 
83/101  (82) 
142/175  (81) 

63/74  (85) 
84/101  (83) 
147/175  (84) 

Individual  Benign^ 

Group  1 

Group  2 

Combined 

51/81  (63) 

23/68  (34) 
74/149  (50) 

58/81  (72) 
33/68  (49) 
91/149  (61) 

60/81  (74) 

44/68  (65) 
104/149  (70) 

Grouped  Malignant* 

Group  1 

Group  2 

Combined 

49/64  (77) 
71/92(77) 
120/156  (77) 

53/64  (83) 
77/92  (84) 
130/156  (83) 

55/64  (86) 

80/92  (87) 
135/156  (87) 

Note.  — The  numbers  in  parentheses  are  percentages 
*The  TPFs  are  given  for  the  three  different  normal  film  marker  rates 
^Each  individual  mass  in  a  film  or  case  is  considered  as  a  TP  for  the  individual  malignant 


and  benign  categories. 

*A11  malignant  masses  for  a  film  or  case  are  considered  together  as  one  TP  for  the 
grouped  malignant  category. 


21 


CAPTIONS  FOR  ILLUSTRATIONS 


Fig  1 :  Histogram  of  lesion  subtlety  for  the  138  and  142  cases  in  the  Group  1  and  Group  2 

databases,  respectively,  as  ranked  by  the  radiologist  reviewing  the  cases.  Each  mass  in 
each  film  was  rated  independently  by  the  radiologist.  For  comparison  purposes,  the  plot 
is  of  the  percentage  of  masses  falling  within  each  category.  The  total  number  of  masses 
for  each  group  can  be  found  in  Table  1. 

Fig  2:  The  block  diagram  for  the  mass  detection  scheme  evaluated  in  this  study. 

Fig  3;  The  Group  1  FROG  performance  curves  for  malignant  and  benign  masses.  The  per-case 
curve  is  obtained  by  defining  a  TP  as  the  detection  of  the  mass  in  either  the  CC  or  MLO 
view  mammogram  of  a  breast.  The  per-film  curve  treats  the  same  mass  on  the  CC  and 
MLO  films  independently.  Individual  mass  scoring  is  used  in  the  figure  so  each 
individual  mass  in  a  film  or  case  was  considered  as  a  TP. 

Fig  4:  The  Group  2  FROC  performance  curves  for  malignant  and  benign  masses.  Individual 
mass  scoring  is  used  in  the  figure  so  each  individual  mass  in  a  film  or  case  was 
considered  as  a  TP. 

Fig  5:  The  Group  1  and  2  FROC  performance  curves  for  malignant  masses.  Grouped  mass 
scoring  is  used  in  the  figure  so  all  malignant  masses  in  a  film  or  case  were  considered 
together  as  one  TP. 

Fig  6:  The  combined  Group  I  and  2  FROC  performance  curves  for  spiculated  and  non- 

spiculated  masses.  The  benign  spiculated  mass  curve  is  not  shown  because  of  the  small 
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number  of  cases  in  this  category.  Individual  mass  scoring  is  used  in  the  figure  so  each 
individual  mass  in  a  film  was  considered  as  a  TP. 
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Fig  1:  Histogram  of  lesion  subtlety  for  the  138  and  142  cases  in  the  Group  1  and  Group  2 
databases,  respectively,  as  ranked  by  the  radiologist  reviewing  the  cases.  Each  mass  in 
each  film  was  rated  independently  by  the  radiologist.  For  comparison  purposes,  the  plot 
is  of  the  percentage  of  masses  falling  within  each  category.  The  total  number  of  masses 
for  each  group  can  be  found  in  Table  1. 
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Fig  2:  The  block  diagram  for  the  mass  detection  scheme  evaluated  in  this  study. 
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Fig  3;  The  Group  1  FROG  performance  curves  for  malignant  and  benign  masses.  The  per-case 
curve  is  obtained  by  defining  a  TP  as  the  detection  of  the  mass  in  either  the  CC  or  MLO 
view  mammogram  of  a  breast.  The  per-film  curve  treats  the  same  mass  on  the  CC  and 
MLO  films  independently.  Individual  mass  scoring  is  used  in  the  figure  so  each 
individual  mass  in  a  film  or  case  was  considered  as  a  TP. 
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Fig  4:  The  Group  2  FROG  performance  curves  for  malignant  and  benign  masses.  Individual 
mass  scoring  is  used  in  the  figure  so  each  individual  mass  in  a  film  or  case  was 
considered  as  a  TP. 
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Fig  5:  The  Group  1  and  2  FROC  performance  curves  for  malignant  masses.  Grouped  mass 
scoring  is  used  in  the  figure  so  all  malignant  masses  in  a  film  or  case  were  considered 
together  as  one  TP. 
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Fig  6:  The  combined  Group  1  and  2  FROG  performance  curves  for  spiculated  and  non- 
spiculated  masses.  The  benign  spiculated  mass  curve  is  not  shown  because  of  the  small 
number  of  cases  in  this  category.  Individual  mass  scoring  is  used  in  the  figure  so  each 
individual  mass  in  a  film  was  considered  as  a  TP. 
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Abstract 
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Recent  clinical  studies  have  proved  that  computer-aided  diagnosis  (CAD)  systems  are 
helpful  for  improving  lesion  detection  by  radiologists  in  mammography.  However,  these 
systems  would  be  more  useful  if  the  false-positive  rate  is  reduced.  Current  CAD  systems 
generally  detect  and  characterize  suspicious  abnormal  structures  in  individual  mammographic 
images.  Clinical  experiences  by  radiologists  indicate  that  screening  with  two  mammographic 
views  improves  the  detection  accuracy  of  abnormalities  in  the  breast.  It  is  expected  that  fusion 
of  information  from  different  mammographic  views  will  improve  the  performance  of  CAD 
systems.  We  are  developing  a  two-view  matching  method  that  utilizes  the  geometric 
locations,  and  morphological  and  textural  features  to  correlate  objects  detected  in  two 
different  views  using  a  prescreening  program.  First,  a  geometrical  model  is  used  to  predict  the 
search  region  for  an  object  in  a  second  view  from  its  location  in  the  first  view.  The  distance 
between  the  object  and  the  nipple  is  used  to  define  the  search  area.  After  pairing  the  objects  in 
two  views,  textural  and  morphological  characteristics  of  the  paired  objects  are  merged  and 
similarity  measures  are  defined.  Linear  discriminant  analysis  is  then  employed  to  classify 
each  object  pair  as  a  true  or  false  mass  pair.  The  resulting  object  correspondence  score  is 
combined  with  its  one-view  detection  score  using  a  fusion  scheme.  The  fusion  information 
was  found  to  improve  the  lesion  detectability  and  reduce  the  number  of  FPs.  In  a  preliminary 
study,  we  used  a  data  set  of  169  pairs  of  cranio-caudal  (CC)  and  mediolateral  oblique  (MLO) 
view  mammograms.  For  the  detection  of  malignant  masses  on  current  mammograms,  the 
film-based  detection  sensitivity  was  found  to  improve  from  62%  with  a  one-view  detection 
scheme  to  73%  with  the  new  two-view  scheme,  at  a  false-positive  rate  of  1  PT/image.  The 
corresponding  cased-based  detection  sensitivity  improved  from  77%  to  91%. 

Keywords:  computer-aided  diagnosis,  mammography,  mass  detection,  classification,  fusion 
of  information. 
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I.  Introduction 


X-ray  mammography  is  the  only  proven  diagnostic  technique  for  detecting  breast  cancer  in 

its  early  stages.^’  ^  In  mammographic  screening,  a  cranio-caudal  (CC)  and  a  mediolateral 
oblique  (MLO)  view  are  routinely  taken  for  each  breast.  The  two  views  not  only  allow  most 
of  the  breast  tissue  to  be  imaged  but  also  improve  the  chance  that  a  lesion  will  be  seen  in  at 
least  one  of  the  views.  Radiologists  analyze  the  different  mammographic  views  to  detect 
calcifications  and  masses  that  may  be  a  sign  of  breast  cancer  and  to  decide  whether  to  call  the 
patient  back  for  further  diagnostic  evaluations.  They  also  use  the  two  views  to  reduce  false 
positives  such  as  overlapping  dense  tissue  in  one  view  that  mimics  masses.  Their 
interpretation  integrates  complex  criteria  of  human  vision  and  intelligence,  including 
morphology,  texture,  and  geometric  location  of  any  suspicious  structures  of  the  imaged  breast, 
combining  information  from  different  views,  checking  differences  between  the  two  breasts, 
and  looking  for  changes  between  the  prior  and  current  mammograms  when  available.  Clinical 
studies  indicate  that  lesion  detectability  in  two-view  mammograms  is  more  accurate  than 
when  only  one  view  is  available.^  4,5 

It  has  also  been  shown  that,  independent  double  reading  by  two  radiologists  significantly 

increases  the  sensitivity  of  mammographic  screening.^,  ^  However,  the  increased  cost  and 
workload  to  the  radiologists  make  double  reading  impractical  in  most  screening  situations.  To 
provide  a  second  opinion  to  the  radiologists,  computer-aided  diagnosis  (CAD)  systems  have 
been  developed  using  computer  vision  and  pattern  recognition  techniques  to  automatically 
detect  and  characterize  abnormal  lesions  on  mammograms.  Although  it  has  been  reported  that 
these  systems  are  useful  in  reducing  the  error  rate  in  mammographic  screening,8,9,10  the 
detection  sensitivity  of  these  systems  needs  to  be  improved  and  the  false-positive  (FP)  rate 
reduced  to  provide  maximum  benefit  to  the  radiologist  and  the  patient.  CAD  algorithms 
reported  in  the  literature  so  far  use  one-view  information  for  detection  of  lesions  even  though 
the  accuracy  may  be  scored  and  reported  using  two  views.  Yin  et  al.^  used  bilateral 
subtraction  in  a  prescreening  step  of  a  mass  detection  program  to  locate  mass  candidates,  but 
the  subsequent  image  analysis  was  performed  based  only  on  a  single  view.  Recently, 

Hadjiiski  et  al.l^’  have  developed  an  interval  change  analysis  of  masses  on  current  and 
prior  mammograms  and  found  that  the  classification  accuracy  of  malignant  and  benign 
masses  can  be  improved  significantly  in  comparison  to  single  image  classification.  These 
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studies  demonstrated  the  potential  of  using  multiple  image  information  for  CAD.  However, 
current  CAD  algorithms  have  not  utilized  one  of  the  most  important  pieces  of  information 
available  in  a  mammographic  examination  -  the  correlation  of  computer-detected  lesions 
between  the  two  standard  views.  This  is  a  very  difficult  problem  for  computer  vision  because 
the  breast  is  elastic  and  deformable.  The  overlapping  tissue  and  the  relative  position  of  the 
breast  structures  are  generally  different  even  when  the  breast  is  compressed  in  the  same  view 
two  different  times.  The  change  in  geometry  for  an  elastic  object  and  lack  of  invariant 
“landmarks”  make  difficult,  if  not  impossible,  to  correctly  register  two  breast  images  in  the 
same  view  by  any  established  image  warping  technique  or  by  using  an  analytic  model  to 
predict  corresponding  object  locations  in  the  different  views  of  the  same  breast. 

Few  studies  have  been  conducted  on  how  to  find  the  relationship  between  structures  in 
different  mammographic  views.  Highnam  et  al.^'^  proposed  a  breast  deformation  model  for 

compressed  breasts  and  Kita  et  al.^^  used  the  model  for  finding  corresponding  points  in  two 
different  views.  They  demonstrated  with  a  data  set  of  26  cases  (a  total  of  37  lesions)  that  this 
method  allowed  prediction  of  location  in  a  second  view  within  a  band  of  pixels  ±26  mm  from 
an  epipolar  line.  However,  assumptions  on  the  parameters  and  the  deformation  of  a 
compressed  breast  had  to  be  made  and  the  robustness  of  the  model  has  yet  to  be  validated. 
More  practical  approaches,  which  do  not  depend  on  a  large  number  of  assumptions,  may  be 
preferable.  Good  et  al.  and  Chang  et  al.  recently  reported  preliminary  attempt  of  matching 

computer-detected  objects  in  two  views. They  demonstrated  the  feasibility  of 
identifying  corresponding  objects  (Az=0.82)  in  the  two  views  by  exhaustive  pairing  of  the 
detected  objects  and  feature  classification.  None  of  these  studies  attempted  to  use  the  two- 
view  correspondence  information  to  improve  lesion  detection  or  classification. 

During  mammographic  interpretation,  if  a  suspicious  breast  mass  is  found  in  one  view,  the 
radiologist  will  attempt  to  find  the  same  object  in  the  other  available  views  in  order  to  identify 
the  object  as  a  true  or  a  false  mass.  Radiologists  commonly  consider  the  distance  from  the 
nipple  to  the  center  of  the  suspicious  lesion  in  one  view  and  then  search  the  corresponding 
object  in  the  second  view  in  an  annular  region  at  about  the  same  radial  distance  from  the 
nipple.  Based  on  this  approach,  we  previously  developed  a  regional  registration  technique  to 
identify  corresponding  lesion  locations  on  current  and  prior  mammograms  of  the  same 

view.  1^,1 9,^3  \Ye  have  also  designed  geometric  models  that  can  localize  corresponding 
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lesions  within  a  search  region  when  two-view  or  three-view  mammograms  are  available  for 

lesion  localization.^^  With  the  geometric  information,  the  computer  searches  for  a 
corresponding  lesion  in  the  other  view  within  a  limited  search  region.  The  object  of  interest 
can  then  be  matched  with  possible  corresponding  objects  in  the  search  region  using  the 
similarity  of  feature  measures.  We  have  found  that  the  geometric  constraints  improved  the 
chance  of  correctly  matching  lesions  in  current  and  prior  mammograms  for  classification  of 
malignant  and  benign  masses.21  In  this  study,  we  explore  the  use  of  the  regional  registration 
technique  as  a  basis  to  correlate  lesions  in  the  two  views.  The  correspondence  information  is 
used  to  reduce  false  detections  produced  by  our  one-view  CAD  algorithm.  The  detection 
accuracy  of  the  two-view  scheme  was  evaluated  and  compared  to  our  current  one-view  CAD 
scheme  using  free  response  receiver  operating  characteristic  (FROC)  analysis. 

n.  Materials  AND  Methods 

Our  approach  to  improving  the  accuracy  of  the  mass  detection  is  to  merge  information 
from  corresponding  segmented  structures  in  the  two  standard  views  of  the  same  breast.  We 
first  assume  that  a  true  mass  will  have  a  higher  chance  of  being  detected  in  both  views. 
Likewise,  we  assume  that  the  objects  corresponding  to  the  same  mass  detected  in  the  two 
different  views  (a  TP-TP  pair)  will  be  more  similar  in  their  feature  measures  than  a  mass 
object  compared  to  normal  tissue  (a  TP-FP  pair),  or  two  false-positives  (an  FP-FP  pair). 
Object  matching  is  performed  in  two  stages.  First,  all  possible  pairing  of  the  detected  objects 
on  the  two  views  are  determined,  taking  into  account  geometric  constraints.  Second,  features 
are  extracted  from  each  object,  similarity  measures  for  the  features  pairs  are  derived,  and  a 
classifier  is  trained  to  classify  true  pairs  (TP-TP  pairs)  from  false  pairs  (TP-FP,  FP-TP  or  FP- 
FP  pairs)  using  the  similarity  measures.  The  two  stages  are  detailed  below.  The  data  sets  used 
in  the  development  and  evaluation  of  this  approach  are  described  next. 

A.  Image  acquisition  and  data  set 

Two  data  sets  of  two-view  mammograms  were  collected  and  separately  used  to  train  and 
test  the  geometric  models  and  our  proposed  two-stage  information  fusion  technique.  These 
mammograms  were  selected  from  patient  files  in  the  Breast  Imaging  Division  at  the 
University  of  Michigan. 
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For  the  geometric  modeling  of  object  location  on  two  views,  the  database  consisted  of  116 
cases  with  masses,  large  benign  calcifications,  or  clustered  microcalcifications  identifiable  on 
both  views  of  the  same  breast.  The  mammograms  were  digitized  with  a  LUMISYS  85  film 
scanner  with  a  pixel  size  of  50  pm  and  12-bit  gray  levels.  The  gray  levels  were  calibrated  to 
be  linearly  proportional  to  optical  density  in  the  0.1  to  4.0  O.D.  range.  The  images  were 
reduced  to  a  pixel  resolution  of  800  pm  x  800  pm  by  averaging  16  x  16  neighboring  pixels 
and  down-sampling.  For  each  case,  the  two  standard  mammographic  views  were  available.  A 
total  of  177  objects  were  manually  selected  and  marked  by  an  expert  radiologist  on  each  of 
these  two  views.  The  nipple  location  was  also  identified  for  each  breast  image.  The  radial 
distance  of  the  selected  objects  was  calculated  and  the  prediction  model  of  an  object  location 
in  one  view  from  its  location  in  the  other  view  was  estimated,  as  described  above. 

For  the  evaluation  of  the  two-view  mass  detection  scheme,  a  data  set  of  169  pairs  of 
mammograms  containing  masses  on  both  the  CC  and  MLO  views  was  used.  The 
mammograms  were  obtained  from  117  patients,  of  which  128  pairs  were  current 
mammograms  (defined  as  mammograms  from  the  exam  before  biopsy)  and  41  pairs  were 
from  exams  1  to  4  years  prior  to  biopsy.  58  of  the  128  current  and  26  of  the  41  prior  image 
pairs  contained  a  malignant  mass.  The  338  mammograms  were  also  digitized  with  the 
LUMISYS  85  film  scanner.  The  true  mass  locations  on  both  views  were  identified  and  rated 
by  a  radiologist  approved  by  the  Mammography  Quality  Standards  Act  (MQSA).  The 
histograms  of  the  size  (longest  dimension)  and  the  visibility  (subtlety)  rating  of  the  benign 
and  malignant  masses  contained  in  this  data  set  are  shown  in  Fig.  1  and  2,  respectively.  The 
subtlety  of  the  masses  was  estimated  subjectively  on  a  10-point  scale  by  the  experienced 
radiologist  relative  to  the  masses  encountered  in  clinical  practice. 

B.  Geometrical  Modeling 

We  will  first  describe  the  geometric  models  that  we  developed  for  predicting  the  location 
of  an  object  in  the  MLO  view  from  that  in  the  CC  view  or  vice  versa.  For  the  purpose  of 
studying  the  geometric  relationship  between  the  locations  of  an  object  imaged  on  the  two 
mammographic  views,  any  identifiable  objects  can  be  used.  We  therefore  chose  two-view 
mammograms  that  contained  masses,  microcalcification  clusters,  and  large  benign 
calcifications  identifiable  on  both  views.  This  data  set  was  different  from  that  used  for  mass 
detection  to  be  described  below.  The  locations  of  the  corresponding  objects  on  the  two  views 
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and  the  nipple  locations  were  identified  on  the  mammograms  by  the  MQSA-approved 
radiologist.  For  a  large  object  such  as  a  mass  or  a  microcalcification  cluster,  the  manually 
identified  “centroid”  was  taken  as  its  location.  A  breast  boundary  tracking  program  was  used 
to  segment  the  breast  area  from  the  mammogram.22,  23  Using  the  nipple  location  as  the 
origin,  concentric  circles  were  drawn,  each  of  which  intersected  the  breast  boundary  at  two 
points  and  defined  an  arc.  The  locus  of  the  mid-points  of  these  arcs  was  considered  to  be  the 
breast  midline.  The  breast  length  was  defined  as  the  distance  from  the  nipple  to  the  point 
where  the  midline  intersected  the  chest  wall.  From  these  parameters,  the  polar  coordinates 
(Rx,  0x)  with  X  =  C  (CC  view),  or  M  (MLO  view),  as  shown  in  Fig.  3,  were  defined,  where  Rx 
was  the  distance  from  the  nipple  to  the  object  center  and  0x,  the  angle  between  Rx  and  the  line 
from  the  nipple  to  the  mid-point  of  the  arc  intersecting  the  object.  We  investigated  the 
relationship  between  the  coordinate  of  the  object  on  one  view  and  that  on  the  other  view  in 
this  coordinate  system. 

Scatter  plots  of  the  radial  distance  and  the  angle  of  the  radiologist-identified  objects  on  the 
two  views  in  the  data  set  are  shown  in  Fig.  4  and  Fig.  5,  respectively.  It  can  be  seen  that  there 
is  a  high  correlation  (correlation  coefficient=0.94)  of  the  radial  distances  of  the  corresponding 
objects  in  the  two  views.  However,  the  angular  coordinates  in  the  two  views  are  basically 
uncorrelated  (correlation  coefficient=0.42).  We  therefore  chose  a  linear  model  for  predicting 
the  radial  distance  of  an  object  in  a  second  view  from  that  in  the  first  view: 

Because  of  the  variability  of  the  breast  tissue  caused  by  compression,  the  predicted 
location  for  an  individual  case  could  deviate  from  its  "true"  location,  as  determined  by  the 
radiologist,  by  a  wide  range.  Therefore,  we  estimated  a  global  model  using  a  set  of  training 
cases  with  radiologist-identified  object  locations  on  both  views.  The  model  coefficients  were 
obtained  by  minimizing  the  mean  square  error  between  the  true  and  the  predicted  coordinates 
in  the  second  view.  The  error  in  this  estimation  was  then  used  to  define  an  annular  search 
region,  which  had  a  center  at  a  radial  distance  Ry  from  the  nipple  as  predicted  by  the  model, 
and  a  width  of  ±  AR  as  estimated  from  the  localization  errors  observed  in  the  training  set. 
This  search  region  avoids  using  the  entire  area  of  the  breast  and  eliminates  many 
inappropriate  pairings  between  detected  objects  on  the  CC  view  and  the  MLO  view  in  the 
second  stage,  discussed  in  Section  II.D  below. 
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We  randomly  divided  the  available  data  set  into  a  training  set  and  a  test  set  in  a  3:1  ratio. 
The  training  set  was  used  for  the  estimation  of  the  model  coefficients  and  the  search  region 
width.  The  test  set  was  used  for  evaluating  the  prediction  accuracy  of  the  model.  Four  non¬ 
overlapping  partitions  separating  the  database  into  training  and  test  sets  were  considered.  The 
model  performance  was  then  obtained  by  combining  the  results  of  the  four  test  sets. 


The  geometrical  analysis  is  then  used  for  pairing  objects  detected  on  the  two  views  of  the 
same  breast  in  the  prescreening  stage  of  our  mass  detection  program  as  detailed  below. 

C.  One-view  analysis 

The  one-view  approach  is  used  to  identify  potential  breast  masses  among  the  suspicious 
objects.  The  one-view  prescreening  used  in  this  study  is  similar  to  that  discussed 

previously.24,  25^  26  j^e  only  difference  is  that  the  false  positive  (FP)  reduction  step  was 
modified  such  that  a  slightly  different  object  overlap  criterion  was  employed.  The  block 
diagram  for  the  one-view  mass  detection  scheme  is  shown  in  Fig.  6.  A  density-weighted 
contrast-enhancement  (DWCE)  filter  is  first  applied  to  each  digitized  mammogram.  The 
DWCE  filter  enhances  mammographic  structures  in  the  breast  image.  Following  this 
preprocessing  filtering,  edge  detection  is  employed  to  refine  the  borders  of  the  detected 
regions.  K-means  clustering  is  then  applied  to  a  25  mm  x  25  mm,  background-corrected 
region  of  interest  centered  on  each  initially  detected  object  to  improve  the  object  border.  This 
segmentation  process  extracts  a  large  number  of  objects,  including  masses  and  normal  breast 
structures.  In  order  to  reduce  the  number  of  non-mass  objects,  different  FP  reduction  stages 
based  on  morphological  features,  overlap  of  the  detected  regions,  and  texture  features  were 
designed  and  trained  using  an  independent  set  of  mammograms  in  a  previous  study.26^  27  jj 
was  found  that  11  morphological  features  composed  of  shape  descriptors  and  15  spatial  gray 
level  dependence  (SOLD)  texture  features  extracted  for  each  object  were  useful  for  FP 
reduction.28^  29  jn  this  study,  rule-based  classification  using  the  11  morphological  features 
reduced  the  average  number  of  objects  from  37  to  about  29  per  image  and  lowered  the  TP 
detection  sensitivity  from  91.1%  to  87.9%  at  this  stage.  The  15  texture  features  were  then 
used  as  the  input  variables  for  a  linear  discriminant  analysis  (LDA)  classifier.  A  texture  score 
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for  each  object  was  obtained  from  the  classifier.  Overlap  reduction  was  then  applied  using 
these  texture  scores  as  discussed  below. 


During  object  segmentation,  the  border  of  an  object  is  obtained  by  K-means  clustering  in  a 
fixed  sized  region  centered  on  a  “seed”  object.  If  the  seeds  from  two  objects  are  close  to  each 
other,  the  two  segmented  objects  can  overlap  each  other.  This  occurs  when  the  two  detected 
objects  are  neighboring  structures  that  overlap  in  the  mammographic  view  or  they  may  be  part 
of  a  large  single  structure  that  was  initially  detected  in  multiple  pieces.  An  overlap  criterion 
based  on  the  texture  scores  is  imposed  to  select  one  of  the  two  overlapping  objects  as  a  mass 
candidate.  In  this  study,  we  used  the  shape  of  the  segmented  objects  to  estimate  the 
overlapping  area  between  the  two  neighboring  objects  on  the  mammogram.  An  overlap 
fraction  was  defined  as: 


Overlap  = 


0,  002 

D,  UO2 


where  Oj  and  O2  are  the  segmented  areas  of  the  overlapping  objects.  A  threshold  on  the 
overlap  fraction  was  chosen  such  that  if  the  overlap  fraction  of  two  objects  exceeded  the 
threshold,  the  object  with  the  higher  texture  score  (i.e.,  more  likely  to  be  a  mass  candidate) 
was  kept  and  the  other  was  discarded  as  an  FP.  The  sensitivity  and  the  specificity  of 
differentiating  true  and  false  masses  depend  on  the  selection  of  the  overlap  threshold.  We 
chose  an  overlap  threshold  of  15%  which  led  to  an  average  of  15  objects  per  image  at  a 
detection  sensitivity  of  about  85%.  As  shown  later  in  the  Results  section,  the  overall 
detection  accuracy  was  relatively  independent  of  the  FP  rate  in  this  intermediate  stage  so  that 
the  selection  of  the  15%  overlap  threshold  was  not  a  critical  factor. 

After  overlap  reduction,  our  current  one-view  algorithm  employed  a  final  stage  of  FP 
reduction  based  on  the  texture  scores,  as  illustrated  in  the  block  diagram  in  Fig.  7.  A  decision 
threshold  was  applied  to  the  texture  scores  such  that  objects  with  scores  lower  than  the 
threshold  were  excluded  as  FPs.  In  addition,  another  criterion  was  imposed  so  that  no  more 
than  three  objects  were  kept  on  each  image.  By  comparing  the  retained  objects  with  the  true 
mass  locations  on  each  image  for  a  range  of  decision  thresholds,  an  FROC  curve 
characterizing  the  sensitivity  as  a  function  of  the  number  FPs  per  image  could  be  generated. 

D.  Two-view  analysis 
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The  block  diagram  in  Fig.  7  illustrates  our  two-view  mass  detection  scheme  and  its 
relationship  to  our  current  one-view  approach.  The  detection  algorithm  described  above  was 
used  as  a  prescreening  stage  in  our  two-view  fusion  approach.  The  only  difference  was  that 
the  operating  threshold  that  limits  the  maximum  number  of  objects  on  an  image  was  relaxed 
to  increase  sensitivity  while  retaining  a  larger  number  of  FPs.  The  remaining  objects  after  this 
threshold  will  be  still  referred  to  as  the  prescreening  objects  in  the  following  discussions.  To 
investigate  the  dependence  of  the  overall  detection  accuracy  of  our  two-view  detection 
scheme  on  the  initial  number  of  prescreening  objects,  three  different  decision  thresholds  were 
selected  to  obtain  a  maximum  of  either  5, 10,  or  15  objects  per  image. 

To  further  perform  the  two-view  information  fusion  analysis,  an  expanded  set  of 
morphological  features  was  extracted  from  each  prescreening  object.  These  morphological 
features  included  the  1 1  shape  descriptors  discussed  previously,  and  13  new  contrast  measures 

and  7  new  shape  features.  In  order  to  evaluate  the  new  method,  we  randomly  divided  the 
available  cases  into  a  training  and  a  test  set  using  a  3:1  training/test  ratio.  The  training  set  was 
used  to  select  a  subset  of  useful  morphological  features  using  stepwise  feature  selection  and  to 
estimate  the  coefficients  of  an  LDA  classifier.  To  reduce  biases  in  the  classifier,  50  random 
3:1  partitions  of  the  cases  were  employed.  A  morphological  score  was  obtained  for  each 
individual  object  by  averaging  the  test  score  of  the  object  obtained  from  the  different 
partitions.  The  morphological  score  was  then  combined  with  the  one-view  texture  score  by 
averaging  the  two  scores.  A  single  combined  score  thus  characterized  each  prescreening 
object.  This  one-view  score  was  further  fused  with  the  discriminant  score  obtained  by  the 
two-view  scheme,  as  described  below. 

The  prescreening  objects  wereanalyzed  by  the  two-view  method  shown  in  the  right  branch 
of  the  diagram  in  Fig.  7.  All  possible  pairing  between  the  prescreening  objects  in  the  two 
views  of  the  same  breast  was  determined  using  the  distance  from  the  nipple  to  the  centroid  of 
each  object  and  the  geometrical  model  described  above.  Since  the  location  of  a  given  object 
detected  in  one  view  cannot  be  uniquely  identified  in  the  other  view,  as  described  in  Section 
II.B,  an  object  was  initially  paired  with  all  objects  with  centroids  located  within  its  defined 
annular  region  in  the  other  view.  The  geometric  constraints  reduced  the  number  of  object 
pairs  that  needed  to  be  classified  as  true  or  false  correspondences  in  the  subsequent  steps.  A 
true  pair  (TP-TP)  was  defined  as  the  correspondence  between  the  same  true  masses  on  the  two 
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mammographic  views,  and  a  false  pair  is  defined  as  any  other  object  pairing  (TP-FP,  FP-TP 
and  FP-FP).  For  each  object  pair,  the  set  of  15  texture  and  31  morphological  features 
described  above  were  used  to  form  similarity  measures.  In  this  preliminary  study,  two  simple 
measures,  the  absolute  difference  and  the  mean,  were  used.  A  total  of  30  texture  measures  and 
62  morphological  measures  were  thus  obtained  for  each  object  pair.  The  absolute  difference 
between  the  nipple-to-object  distances  in  the  CC  and  MLO  views  was  also  included  in  both 
the  texture  and  morphological  feature  sets  as  a  feature  for  differentiating  true  from  false 
object  pairs.  Two  separate  LDA  classifiers  with  stepwise  feature  selection  were  trained  to 
classify  the  true  and  false  pairs  using  the  similarity  features  in  the  morphological  and  texture 
feature  spaces,  respectively. 

For  training  the  classifiers,  the  data  set  was  randomly  divided  into  a  training  set  and  a  test 
set  again  using  a  3:1  training/test  ratio.  Fifty  random  3:1  partitions  of  the  cases  were  used  to 
reduce  bias.  Individual  morphological  and  texture  scores  were  obtained  for  each  object  pair 
by  averaging  the  test  scores  of  each  object  pair  obtained  from  the  different  partitionings.  The 
two  classification  scores  were  then  averaged  to  obtain  one  “correspondence”  score  for  each 
object  pair.  This  score  along  with  the  one-view  prescreening  score  were  used  in  the  following 
fusion  step. 

E.  Fusion  analysis 


The  fusion  of  the  one-view  prescreening  scores  with  the  two-view  correspondence  scores 
was  the  final  step  in  our  two-view  detection  scheme.  In  this  study,  we  designed  a  fusion 
scheme  that  combines  ranking  and  averaging  of  the  prescreening  and  correspondence  scores. 
We  first  ranked  all  prescreening  object  scores  within  a  given  film  from  the  largest  to  the 
smallest.  The  correspondence  scores  were  ranked  in  a  similar  way.  These  two  new  rank 
scores  were  then  merged  into  a  single  score  for  each  object  in  each  view.  Since  an  object 
could  have  more  than  one  correspondence  score,  its  two-view  correspondence  score  was  taken 
to  be  the  maximum  correspondence  score  among  all  object  pairs  in  which  this  object  was  a 

member.  There  can  be  many  variations  for  the  fusion  step.^l.  32  jjj  tf,js  preliminary  study, 
the  final  discriminant  score  for  an  object  was  obtained  by  averaging  its  two-view 
correspondence  score  rank  with  its  one-view  prescreening  score  rank. 
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The  FROC  performance  curve  for  the  two-view  analysis  was  generated  by  varying  the 
decision  threshold  on  the  final  discriminant  score  for  each  object  and  determining  the 
sensitivity  and  FP  per  image  at  each  threshold.  .  We  compared  the  FROC  performance 
curves  obtained  by  the  two-view  scheme  when  starting  with  5,  10,  and  15  prescreening 
objects  per  image  and  that  obtained  with  the  one- view  detection  scheme. 

in.  Results 

A.  Geometrical  Modeling 

In  the  geometrical  analysis  experiments,  we  first  estimated  a  prediction  model  of  the  radial 
distance  of  an  object  in  a  second  view  from  its  radial  distance  in  the  first  view  using  the 
training  set.  The  model  was  then  used  to  predict  object  location  from  one  view  to  the  other 
for  the  independent  test  cases.  Since  the  model  did  not  provide  an  exact  solution,  a  search 
region,  R±AR,  where  R  was  the  predicted  radial  distance  and  AR  the  half  width  of  an  annular 
region,  was  defined.  The  percentage  of  the  true  object  centroids  enclosed  within  the  search 
region  was  measured  as  a  function  of  the  size  of  2AR.  Fig.  8  shows  the  prediction  accuracy  as 
a  function  of  2AR  for  estimating  the  object  radial  distance  in  the  MLO  view  from  that  in  the 
CC  view.  Fig.  9  shows  the  corresponding  results  for  predicting  the  object  radial  distance  in 
the  CC  view  from  that  in  the  MLO  view.  The  training  and  test  curves  almost  overlap  in  each 
case.  The  difference  in  the  accuracy  between  searching  the  object  centers  in  the  CC  or  MLO 
views  IS  small.  About  83%  of  the  object  centers  are  within  the  search  region  when  the  radial 
width  of  the  search  region  is  about  40  pixels  (32  mm)  for  either  the  CC  view  or  the  MLO 
view.  These  results  indicate  that  the  search  region,  although  large,  is  much  smaller  than  the 
entire  area  of  the  breast.  The  limited  search  region  size  reduces  the  number  of  object  pairs  to 
be  analyzed  in  the  two- view  detection  scheme.  To  avoid  missing  any  pairs  of  true  masses  in 
the  two-view  scheme,  we  chose  to  set  the  radial  width  of  the  annular  search  region  to  about  80 
pixels.  This  led  to  a  larger  number  of  false  pairs,  but  it  was  substantially  less  than  that  if  the 
entire  breast  area  was  considered. 

B.  One-View  Analysis 
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The  FROC  curve  obtained  from  our  current  one-view  mass  detection  algorithm  ^6  applied 
to  the  data  set  of  338  images  is  shown  in  Fig.  10.  The  FROC  curves  for  detection  of  the 
malignant  masses  on  the  current  and  prior  mammograms  are  also  plotted  for  comparison. 

In  clinical  application,  if  the  mass  is  detected  on  one-view  by  the  computer  and  the 
radiologist  is  alerted  to  the  mass,  the  radiologist  will  likely  find  the  mass  on  the  other  view,  if 
it  is  visible,  even  if  the  CAD  algorithm  misses  it  on  the  other  view.  Some  researchers 
therefore  consider  a  true-positive  as  the  detection  of  the  mass  on  one  or  two  views  of  the 
breast.  We  refer  to  this  as  case-based  analysis.  In  this  situation,  the  total  number  of  masses  or 
cases  in  this  study  was  169.  For  comparison  purposes,  we  plot  the  case-based  FROC  curves 
for  all  masses,  malignant  masses  on  current  mammograms,  and  malignant  masses  on  prior 
mammograms  in  Fig.  11. 

C.  Fusion  Analysis 

Three  different  decision  thresholds  that  retained  a  maximum  of  5,  10,  and  15  objects  per 
image  after  the  one-view  prescreening  stage  were  used  to  select  mass  candidates  as  inputs  to 
the  two-view  detection  scheme.  Table  I  summarizes  the  characteristics  of  these  three  object 
sets.  The  average  number  of  prescreening  objects  per  image  was  smaller  than  the  maximum 
number  allowed  per  image  because  the  total  number  of  objects  in  some  images  was  smaller 
than  the  maximum  number. 

The  FROC  curves  for  the  detection  of  malignant  and  benign  masses  on  each  image,  using 
our  two-view  fusion  technique,  are  shown  in  Fig.  12.  The  curves  are  similar  for  the  three 
thresholds  of  5,  10,  15  prescreening  objects  per  image.  This  similarity  also  holds  for  the 
FROC  curves  for  detection  of  malignant  masses  as  illustrated  in  Fig.  13.  The  improvement  in 
detection  by  our  current  two-view  fusion  method  therefore  seems  to  be  independent  of  the 
operating  threshold  when  the  maximum  number  of  objects  retained  per  image  in  the 
prescreening  stage  is  between  5  and  15. 

Fig.  14  compares  the  film-based  FROC  curves  for  detection  of  malignant  masses  by  the 
one- view  and  two-view  fusion  methods  obtained  from  the  condition  of  10  prescreening 
objects  per  image.  Fig.  15  compares  the  corresponding  case-based  FROC  curves.  A 
comparison  of  the  detection  sensitivity  at  1  FP/image  between  the  one-view  and  two-view 
fusion  methods  is  given  in  Table  II  for  both  film-based  and  case-based  detection. 
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IV.  Discussion 


In  this  work,  we  propose  a  new  technique  based  on  fusion  of  one-view  and  two-view 
information  to  improve  the  performance  of  mammographic  mass  detection.  The  results  of  our 
preliminary  study  show  that  including  correspondence  information  from  two  mammographic 
views  is  an  effective  technique  for  reducing  FPs.  At  a  case-based  detection  sensitivity  of  75% 
for  all  masses,  the  number  of  FPs  per  image  was  reduced  from  1.5  FPs/image  using  the  one- 
view  detection  technique  to  1.13  FPs/image  using  the  two-view  fusion  technique.  The  results 
also  indicate  that  our  proposed  method  is  more  effective  in  reducing  FPs  in  the  subset  of  cases 
containing  malignant  masses  on  current  mammograms.  At  a  case-based  sensitivity  of  85%  for 
malignant  masses  on  current  mammograms,  the  number  of  FPs  per  image  was  reduced  from 
1.5  FPs/image  to  0.5  FPs/image  using  the  two-view  fusion  technique  (Fig.  15).  Alternatively, 
at  1  FPs/image,  the  two-view  algorithm  achieved  a  case-based  detection  sensitivity  of  91% 

whereas  the  current  one-view  scheme  had  a  77%  sensitivity  at  the  same  number  of  FPs/image 
(Table  II). 

The  two-view  correspondence  analysis  is  more  useful  for  mammogram  pairs  for  which  the 
mass  is  detected  on  both  views  in  the  prescreening  stage.  The  fusion  process  is  designed  to 
both  increase  the  scores  for  the  TPs  and  reduce  the  scores  for  FPs  for  such  cases.  For  the  data 
set  of  169  pairs  of  mammograms  under  the  condition  of  10  prescreening  objects  per  image, 
the  mass  was  detected  on  both  CC  and  MLO  views  in  a  subset  of  120  cases  and  on  only  one 
view  in  another  subset  of  32  cases.  If  we  analyzed  the  subset  of  cases  in  which  the  mass  was 
detected  in  both  views,  at  1  FP/image,  the  case-based  detection  sensitivity  increased  from 
82.5%  for  the  current  one-view  algorithm  to  93.3%  using  the  two-view  fusion  technique. 
However,  for  the  subset  of  cases  in  which  the  mass  was  detected  on  only  one  view  at  the 
prescreening  stage,  the  fusion  analysis  reduced  the  scores  for  TPs.  At  1  FP/image,  the  case- 
based  detection  sensitivity  was  reduced  from  50%  for  the  current  one-view  algorithm  to 
43.7%  using  the  two-view  fusion  process.  Similar  trends  for  the  detection  results  were 
observed  when  5  and  15  objects  per  image  were  retained  in  the  prescreening  stage. 

In  this  study,  we  chose  the  radial  width  of  the  annular  search  region  to  be  80  pixels  for  all 
mammograms.  This  radial  width  reduced  the  search  region  to  only  a  fraction  of  the  breast 
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area  for  large  breasts  but  it  covered  most  of  the  breast  area  in  smaller  breasts.  Therefore,  the 
advantage  of  geometric  correlation  has  not  been  fully  utilized  in  small  breasts.  One  approach 
to  reducing  the  search  region  size  for  small  breasts  would  be  to  choose  the  region  size  as  a 
percentage  of  the  breast  area  so  that  the  actual  width  of  the  annular  region  will  be  different  for 
each  pair  of  mammograms.  This  will  lead  to  a  reduction  in  the  number  of  false  object  pairs  for 
small  breasts.  The  second  approach  would  be  to  use  a  third  mammographic  view  when  it  is 

available.  As  we  discussed  previously^®,  using  the  three  standard  views  (CC,  MLO,  and 
Lateral)  of  the  breast  allow  more  accurate  localization  of  a  lesion  to  within  a  small  fan-shaped 
region.  This  approach  would  require  further  adaptation  of  our  two-view  scheme  to  a  three- 
view  fusion  scheme.  Although  3-view  mammograms  are  not  generally  available  for  screening, 
it  will  be  of  interest  to  investigate  how  3-view  mammograms  will  improve  the  detection  of 
malignancy  in  the  breast  by  the  computer. 

In  this  study,  we  used  radiologist-identified  nipple  locations  for  the  geometric  correlation 
process.  In  a  fully  automated  mass  detection  program,  this  step  will  have  to  be  automated. 
We  are  developing  an  automated  nipple  detection  program.  This  detection  program  could 
identify  the  nipple  within  1  cm  of  the  true  location  in  88%  of  the  311  mammograms  in  a  study 
set.23  For  the  purpose  of  this  study,  we  did  not  use  automated  nipple  detection  because  it  will 
complicate  our  analysis  of  the  two-view  fusion  techniques  if  errors  in  nipple  detection  have  to 
be  taken  into  account.  We  therefore  isolated  the  latter  effects  by  using  manually  identified 
nipple  locations.  We  will  continue  to  improve  the  automated  nipple  detection  algorithm  and 
incorporate  this  step  into  the  two-view  mass  detection  scheme  in  the  future. 

In  this  preliminary  study,  we  used  two  simple  similarity  measures  for  classification  of 
object  correspondence.  The  fusion  of  the  two-view  and  one-view  scores  for  the  individual 
objects  was  performed  with  a  relatively  simple  ranking  and  averaging  methods.  These 
approaches  already  provided  substantial  improvement  in  the  detection  accuracy,  indicating  the 
promise  of  the  two-view  method  for  mass  detection  and  FP  reduction.  Further  studies  are 
being  conducted  to  optimize  the  various  steps  in  the  two-view  classification  and  fusion 
schemes. 

V.  Conclusion 
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We  are  developing  a  two-view  fusion  technique  to  improve  computerized  mass 
detection  on  mammograms.  Starting  from  objects  detected  in  a  prescreening  stage,  we  defined 
all  possible  pairing  based  on  geometry  and  then  combined  morphological  and  textural 
characteristics  from  these  paired  objects  into  a  correspondence  score  for  each  object.  A 
classifier  was  trained  to  differentiate  the  true  mass  pairs  from  the  false  pairs.  A  final  fusion 
stage  combined  the  two-view  object  pair  information  with  the  one-view  object  scores.  Our 
preliminary  results  demonstrate  that  the  proposed  two-view  scheme  can  reduce  FPs  in 
comparison  with  our  current  one-view  method.  The  mass  detection  sensitivity  is  also 
improved  by  using  information  from  the  two-views.  Further  studies  are  underway  to  optimize 
the  pre-screening  process,  the  design  of  the  similarity  measures,  as  well  as  the  two-view 
fusion  scheme.  When  fully  developed  and  integrated  into  the  CAD  system,  it  is  expected  that 
our  proposed  two-view  technique  will  improve  upon  the  current  one-view  scheme  and 

provide  a  useful  second  opinion  to  radiologists  in  the  detection  of  breast  cancer  on 
mammograms. 
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Figure  Captions 


Fig.  1:  Histograms  of  the  size  (the  longest  dimension)  of  the  benign  and  malignant  masses 
contained  in  the  data  set  of  338  one-view  mammograms  and  rated  by  an  MSQA- 
radiologist.  Eight  masses  in  the  prior  mammograms  of  the  data  set  did  not  receive  a 
rating  because  the  radiologist  could  not  delineated  the  mass  even  in  retrospect, 
although  a  focal  density  could  be  seen. 

Fig.  2:  Histograms  of  the  visibility  (1=  most  obvious,  10=subtlest)  of  the  benign  and 
malignant  masses  contained  in  the  data  set  of  338  one-view  mammograms  and  rated 
by  an  MSQA-radioliogist.  Eight  masses  in  the  prior  mammograms  of  the  data  set  did 
not  receive  a  rating  because  the  radiologist  could  not  delineated  the  mass  even  in 
retrospect,  although  a  focal  density  could  be  seen. 

Fig.  3:  Example  of  the  coordinate  system  used  to  localize  an  object  in  a  mammographic  view. 
An  automatic  boundary  tracking  process  is  used  to  segment  the  breast.  The  nipple 
location  was  identified  by  an  MQSA-approved  radiologist.  The  distance  of  the  object 

from  the  nipple  location  is  defined  by  R=||MAr||.  The  angle  of  the  mass  from  the 
midline  of  the  breast  is  defined  by  the  angle  between  the  vectors  MN  and  ON . 

Fig.  4:  CC  view  versus  MLO  view  of  the  radial  distances  of  the  identified  objects  from  the 
nipple  location. 

Fig.  5:  CC  view  versus  MLO  view  of  the  angular  coordinates  of  the  identified  objects  from 
the  breast  midline. 

Fig.  6:  Schematic  diagram  for  the  current  one-view  prescreening  detection  algorithm. 

Fig.  7:  Schematic  diagram  for  the  proposed  two-view  fusion  scheme. 

Fig.  8:  Prediction  of  the  center  of  an  object  in  the  MLO  view  from  its  location  in  the  CC  view. 
Training  and  test  performances  are  given  as  a  function  of  the  radial  width  of  the 
annular  search  region. 

Fig.  9:  Prediction  of  the  center  of  an  object  in  the  CC  view  from  its  location  in  the  MLO  view. 
Training  and  test  performances  are  given  as  a  function  of  the  radial  width  of  the 
annular  search  region. 

Fig.  10:  Film-based  performances  of  the  current  one-view  mass  detection  algorithm  applied  to 
the  data  set  of  338  one-view  (169  pairs)  mammograms.  The  FROC  curves  are  plotted 
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for  detection  of  all  malignant  and  benign  masses,  and  of  the  malignant  masses  on  the 
current  and  the  prior  mammograms.  Higher  sensitivity  was  obtained  for  the  detection 
of  malignant  masses  on  current  mammograms. 

Fig.  11:  Case-based  performances  of  the  current  one-view  mass  detection  algorithm  applied  to 
the  data  set  of  169  pairs  of  mammograms.  The  FROC  curves  are  plotted  for  detection 
of  all  malignant  and  benign  masses,  and  of  the  malignant  masses  on  the  current  and 
the  prior  mammograms.  Higher  sensitivity  was  obtained  for  the  detection  of  malignant 
masses  on  current  mammograms. 

Fig.  12:  Film-based  performances  of  the  proposed  two-view  detection  scheme  for  all  masses. 
Three  initial  conditions  depending  on  the  maximum  number  of  retained  objects  per 
image  (5, 10,  and  15  objects  per  image)  at  the  prescreening  stage  were  evaluated. 

Fig.  13:  Film-based  performances  of  the  proposed  two-view  detection  scheme  applied  to  the 
current  malignant  masses.  Three  initial  conditions  depending  on  the  maximum  number 
of  retained  objects  per  image  (5,  10,  and  15  objects  per  image)  at  the  prescreening 
stage  were  evaluated. 

Fig.  14:  Comparison  of  the  film-based  performance  of  the  one- view  and  two- view  detection 
methods  for  the  detection  of  malignant  masses  on  current  mammograms  and  prior 
mammograms. 

Fig.  15:  Comparison  of  the  case-based  performance  of  the  one- view  and  two-view  detection 
methods  for  the  detection  of  malignant  masses  on  current  mammograms  and  prior 
mammograms. 
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Mass  Size  (mm) 

Fig.  1:  Histograms  of  the  size  (the  longest  dimension)  of  the  benign  and  malignant  masses 
contained  in  the  data  set  of  338  one-view  mammograms  and  rated  by  an  MSQA- 
radiologist.  Eight  masses  in  the  prior  mammograms  of  the  data  set  did  not  receive  a 
rating  because  the  radiologist  could  not  delineated  the  mass  even  in  retrospect, 
although  a  focal  density  could  be  seen. 


Visibility 
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Fig.  2:  Histograms  of  the  visibility  (1=  most  obvious,  10=subtlest)  of  the  benign  and 
malignant  masses  contained  in  the  data  set  of  338  one-view  mammograms  and  rated 
by  an  MSQA-radioliogist.  Eight  masses  in  the  prior  mammograms  of  the  data  set  did 
not  receive  a  rating  because  the  radiologist  could  not  delineated  the  mass  even  in 
retrospect,  although  a  focal  density  could  be  seen. 
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Fig.  3:  Example  of  the  coordinate  system  used  to  localize  an  object  in  a  mammographic  view. 
An  automatic  boundary  tracking  process  is  used  to  segment  the  breast.  The  nipple  location 
was  identified  by  an  MQSA-approved  radiologist.  The  distance  of  the  object  from  the  nipple 

location  is  defined  by  R=||MA^||.  The  angle  of  the  mass  from  the  midline  of  the  breast  is 
defined  by  the  angle  between  the  vectors  MN  and  ON . 
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Fig.  4:  CC  view  versus  MLO  view  of  the  radial  distances  of  the  identified  objects  from  the 
nipple  location. 


Object  angle  in  CC  view 

Fig.  5:  CC  view  versus  MLO  view  of  the  angular  coordinates  of  the  identified  objects  from 
the  breast  midline. 
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Fig.  6:  Schematic  diagram  for  the  current  one-view  prescreening  detection  algorithm. 
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Fig.  7:  Schematic  diagram  for  the  proposed  two-view  fusion  scheme. 
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Fig.  8:  Prediction  of  the  center  of  an  object  in  the  MLO  view  from  its  location  in  the  CC  view. 
Training  and  test  performances  are  given  as  a  function  of  the  radial  width  of  the  annular 
search  region. 
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Fig.  9:  Prediction  of  the  center  of  an  object  in  the  CC  view  from  its  location  in  the  MLO  view. 
Training  and  test  performances  are  given  as  a  function  of  the  radial  width  of  the  annular 
search  region. 
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Fig.  10:  Film-based  performances  of  the  current  one-view  mass  detection  algorithm  applied  to 
the  data  set  of  338  one-view  (169  pairs)  mammograms.  The  FROC  curves  are  plotted  for 
detection  of  all  malignant  and  benign  masses,  and  of  the  malignant  masses  on  the  current  and 
the  prior  mammograms.  Higher  sensitivity  was  obtained  for  the  detection  of  malignant  masses 
on  current  mammograms. 
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Fig.  1 1 ;  Case-based  performances  of  the  current  one- view  mass  detection  algorithm  applied  to 
the  data  set  of  169  pairs  of  mammograms.  The  FROC  curves  are  plotted  for  detection  of  all 
malignant  and  benign  masses,  and  of  the  malignant  masses  on  the  current  and  the  prior 
mammograms.  Higher  sensitivity  was  obtained  for  the  detection  of  malignant  masses  on 
current  mammograms. 
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Fig.  12:  Film-based  performances  of  the  proposed  two- view  detection  scheme  for  all  masses. 
Three  initial  conditions  depending  on  the  maximum  number  of  retained  objects  per  image  (5, 
10,  and  15  objects  per  image)  at  the  prescreening  stage  were  evaluated. 
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Fig.  13:  Film-based  performances  of  the  proposed  two- view  detection  scheme  applied  to  the 
current  malignant  masses.  Three  initial  conditions  depending  on  the  maximum  number  of 
retained  objects  per  image  (5,  10,  and  15  objects  per  image)  at  the  prescreening  stage  were 
evaluated. 
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Fig.  14:  Comparison  of  the  film-based  performance  of  the  one-view  and  two-view  detection 
methods  for  the  detection  of  malignant  masses  on  current  mammograms  and  prior 
mammograms. 
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Fig.  15:  Comparison  of  the  case-based  performance  of  the  one- view  and  two- view  detection 

methods  for  the  detection  of  malignant  masses  on  current  mammograms  and  prior 
mammograms. 
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Table  I:  Characteristics  of  the  3  sets  of  objects  to  be  input  to  the  two-view  scheme.  The 
objects  were  obtained  by  applying  a  detection  threshold  at  the  prescreening  stage  to  extract  a 
maximum  of  5, 10,  and  15  objects  per  image. 


Prescreening 

threshold 

objs/image 

Avg. 

objs/image 

Sensitivity 

film-based 

(%) 

Sensitivity 

case-based 

(%) 

No.  of 
pairs/case 

5 

4.9 

72.7 

85.2 

14.2 

10 

9.4 

79.8 

89.3 

49.4 

15 

12.6 

83.4 

92.3 

85.9 

Table  II:  Comparison  of  detection  sensitivities  obtained  by  the  one-view  and  the  two-view 
fusion  schemes  for  film-based  and  case-based  detection. 


Mass  type 

Sensitivity  - 
nim-based 
(1  FPs/image) 

Sensitivity  - 
case-based 
(1  FPs/image) 

1-view 

2-view 

1-view 

2-view 

All 

50% 

56% 

67% 

73% 

Current  malignant 

62% 

91% 

Prior  malignant 

27% 

33% 

42% 

54% 
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Analysis  of  temporal  changes  of  mammographic  features:  Computer-aided 
classification  of  malignant  and  benign  breast  masses 

Lubomir  Hadjiiski,^^  Berkman  Sahiner,  Heang-Ping  Chan,  Nicholas  Patrick,  Mark  A.  Helvie, 
and  Metin  Gurcan 

Department  of  Radiology,  The  University  of  Michigan,  Ann  Arbor,  Michigan  48109-0904 
(Received  22  May  2001;  accepted  for  publication  27  August  2001) 

A  new  classification  scheme  was  developed  to  classify  mammographic  masses  as  malignant  and 
benign  by  using  interval  change  information.  The  masses  on  both  the  current  and  the  prior  mam¬ 
mograms  were  automatically  segmented  using  an  active  contour  method.  From  each  mass,  20  run 
length  statistics  (RLS)  texture  features,  3  speculation  features,  and  12  morphological  features  were 
extracted.  Additionally,  20  difference  RLS  features  were  obtained  by  subtracting  the  prior  RLS 
features  from  the  corresponding  current  RLS  features.  The  feature  space  consisted  of  the  current 
RLS  features,  the  difference  RLS  features,  the  current  and  prior  speculation  features,  and  the 
current  and  prior  mass  sizes.  Stepwise  feature  selection  and  linear  discriminant  analysis  classifica¬ 
tion  were  used  to  select  and  merge  the  most  useful  features.  A  leave-one-case-out  resampling 
scheme  was  used  to  train  and  test  the  classifier  using  140  temporal  image  pairs  (85  malignant,  55 
benign)  obtained  from  57  biopsy-proven  masses  (33  malignant,  24  benign)  in  56  patients.  An 
average  of  10  features  were  selected  from  the  56  training  subsets:  4  difference  RLS  features,  4  RLS  ■  ■  • 

features,  and  1  speculation  feature  from  the  current  image,  and  1  speculation  feature  from  the  prior, 
were  most  often  chosen.  The  classifier  achieved  an  average  training  A^  of  0.92  and  a  test  A^  of  0.88. 

For  comparison,  a  classifier  was  trained  and  tested  using  features  extracted  from  the  120  current 
single  images.  This  classifier  achieved  an  average  training  A^  of  0.90  and  a  test  A^  of  0.82.  The 
information  on  the  prior  image  significantly  (/?  =  0.015)  improved  the  accuracy  for  classification  of 
the  masses.  ©  2001  American  Association  of  Physicists  in  Medicine.  [DOI:  10.1118/1.1412242] 

Key  words:  computer-aided  diagnosis,  interval  change,  classification,  feature  analysis, 
mammography,  malignancy 


I.  INTRODUCTION 

Mammography  is  currently  the  most  effective  method  for 
early  breast  cancer  detection.  Analysis  of  interval  changes 
is  an  important  method  used  by  radiologists  in  mammo¬ 
graphic  interpretation  to  detect  developing  malignancy.^’"^  A 
variety  of  computer-aided  diagnosis  (CAD)  techniques  have 
been  developed  to  detect  abnormalities  and  to  distinguish 
malignant  and  benign  lesions  on  mammograms.  We  are 
studying  the  use  of  CAD  techniques  to  assist  radiologists  in 
interval  change  analysis. 

Commonly  used  lesion  classification  methods  for  CAD 
employ  information  from  a  single  image.  These  methods 
have  been  shown  to  perform  well  in  lesion  classification 
problems.^”^^  However,  when  mammograms  from  multiple 
examinations  are  available,  it  can  be  expected  that  even 
higher  accuracy  may  be  achieved  if  the  computer  can  utilize 
the  interval  change  information  for  classification.  New  com¬ 
puter  vision  methods  will  have  to  be  designed  to  extract 
features  characterizing  temporal  changes  and  to  improve  the 
differentiation  between  benign  and  malignant  masses. 

A  number  of  researchers  have  developed  algorithms  to 
register  the  mass  on  current  and  prior  mammograms.  Sallam 
et  aO^  have  proposed  a  warping  technique  for  mammogram 
registration  based  on  manually  identified  control  points.  A 
mapping  function  was  calculated  for  matching  each  point  on 
the  current  mammogram  to  a  point  on  the  prior  mammo¬ 


gram.  Brzakovic  et  a0‘^  have  investigated  a  three-step 
method  for  comparison  of  the  most  recent  and  the  prior 
mammograms.  They  first  registered  two  mammograms  using 
the  method  of  principal  axis,  and  partitioned  the  current 
mammogram  using  a  hierarchical  region-growing  technique. 
Translation,  rotation,  and  scaling  were  then  used  for  registra¬ 
tion  of  the  partitioned  regions.  Vujovic  et  al.  have  proposed 
a  multiple-control-point  technique  for  mammogram  registra¬ 
tion.  They  first  determined  several  control  points  indepen¬ 
dently  on  the  current  and  prior  mammograms  based  on  the 
intersection  points  of  prominent  anatomical  structures  in  the 
breast.  A  correspondence  between  these  control  points  was 
established  based  on  a  search  in  a  local  neighborhood  around 
the  control  point  of  interest. 

The  previous  techniques  depend  on  the  identification  of 
control  points.  Furthermore,  these  studies  aimed  at  registra¬ 
tion  without  using  the  results  for  interval  change  analysis. 

Gopal  et  aO^’^^  and  Hadjiiski  et  have  developed  a 

multistage  technique  that  defines  a  transformation  to  locally 
map  the  position  of  the  mass  on  a  current  mammogram  to  a 
search  region  on  the  prior  mammogram.  A  local  search  for 
the  exact  mass  location  is  then  performed  on  the  prior  mam¬ 
mogram.  Good  et  al^^  have  developed  a  technique  that  de¬ 
fines  a  transformation  to  map  all  points  from  the  current 
mammogram  onto  a  prior  mammogram.  The  current  mam¬ 
mogram  is  then  subtracted  from  the  prior  mammogram. 
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2  Hadjiiski  et  al.:  Analysis  of  temporal  changes  of  mammographic  features 


Few  studies  have  been  performed  so  far  in  the  area  of 
automated  classification  of  breast  masses  based  on  the  inter¬ 
val  change  information.  Gopal  et  and  Hadjiiski 
et  have  carried  out  a  preliminary  study  of  the  classi¬ 

fication  scheme  that  combines  prior  and  current  information 
automatically  extracted  from  masses  on  prior  and  current 
mammograms,  respectively.  The  classifier  using  the  com¬ 
bined  prior  and  current  information  performed  better  than  the 
classifier  using  current  information  alone.  To  our  knowledge, 
no  other  studies  that  describe  automated  classification  of  ma¬ 
lignant  and  benign  breast  lesions  based  on  temporal  changes 
of  mammographic  features  have  been  reported. 

The  goal  of  our  research  is  to  develop  a  CAD  method  for 
automated  analysis  of  interval  changes  to  be  used  as  an  aid  to 
radiologists  for  detection  and  classification  of  malignant  and 
benign  lesions  on  mammograms.  In  this  study,  we  conducted 
a  preliminary  investigation  to  demonstrate  the  feasibility  of 
analyzing  temporal  differences  in  the  texture  and  morpho¬ 
logical  features  between  a  mass  on  the  most  recent  mammo¬ 
gram  and  a  prior  mammogram  of  the  same  view  for  the 
classification  task.  Additionally,  we  compared  this  method 
with  two  classification  methods,  one  of  which  is  based  on 
information  extracted  from  the  current  mammograms  alone, 
the  other  one  is  based  on  information  extracted  from  the 
prior  mammograms  alone. 


II.  MATERIALS  AND  METHODS 

The  new  classification  technique  is  based  on  the  design  of 
features  that  characterize  the  temporal  change  in  the  lesion  of 
interest  between  two  mammographic  examinations.  The 
mass  to  be  analyzed  can  either  be  identified  manually  by  a 
radiologist  or  automatically  by  a  computerized  detection  pro¬ 
gram.  In  this  study,  the  mass  on  each  mammogram  was  iden¬ 
tified  by  an  MQSA  certified  radiologist.  The  masses  on  both 
the  current  and  the  prior  mammograms  were  automatically 
segmented  using  an  active  contour  method  that  has  been  dis¬ 
cussed  in  detail  elsewhere.^^’^^  Examples  of  the  segmentation 
are  shown  in  Figs.  2  and  3  for  a  malignant  and  a  benign 
mass,  respectively.  Features  that  characterized  mammo¬ 
graphic  masses  including  texture  features,  morphological 
features,  and  spiculation  features  were  extracted  from  each 
mass.  Three  of  the  morphological  features  are  related  to  the 
mass  size.  Additionally,  difference  features  were  obtained  by 
subtracting  a  feature  of  the  prior  mass  from  the  correspond¬ 
ing  feature  of  the  current  mass.  The  current,  prior,  and  dif¬ 
ference  features  formed  a  multidimensional  feature  space  for 
the  classification  task.  Stepwise  feature  selection  applied  to 
linear  discriminant  analysis  (LDA)  was  used  to  select  the 
most  useful  features.  The  selected  features  were  then  used  as 
the  input  predictor  variables  for  the  LDA  classifier  (Fig.  1). 
The  classifier  was  trained  and  tested  by  a  leave-one-case-out 
resampling  scheme.  A  case  was  considered  to  contain  all 
regions  of  interest  from  a  given  patient.  In  each  resampling 
step,  the  temporal  pairs  from  55  cases  were  used  for  feature 
selection  and  formulation  of  the  linear  discriminant  function, 
while  the  temporal  pairs  from  the  left-out  case  were  used  for 
testing  the  trained  classifier.  A  total  of  56  training  and  testing 


Input 


Fig.  1.  Block  diagram  of  the  classification  method. 


steps  were  obtained  from  the  56  cases.  The  classification 
results  from  the  56  test  cases  were  accumulated  to  evaluate 
the  classifier  performance.  Since  the  data  set  in  this  study 
was  still  small,  we  chose  the  feature  selection  parameters 
such  that  the  dimensionality  of  the  input  feature  vector  for 
the  LDA  classifier  was  small  in  order  to  reduce  the  possibil¬ 
ity  of  over-training.  The  feature  selection  procedure  is  dis¬ 
cussed  in  Sec.  nC. 

To  evaluate  the  improvement  in  the  classifier  performance 
designed  by  using  the  temporal  change  information,  two  ad¬ 
ditional  classifiers  were  obtained.  One  of  them  was  trained 
using  the  information  extracted  from  the  current  single  im¬ 
ages  of  the  temporal  pairs.  We  will  refer  to  these  images  as 
current  images.  The  other  classifier  was  trained  using  the 
information  extracted  from  the  prior  single  images  of  the 
temporal  pairs  and  we  will  refer  to  these  images  as  prior 
images.  Comparison  of  the  three  classifiers  will  reveal  the 
effectiveness  of  interval  change  analysis  for  the  classification 
of  malignant  and  benign  masses. 

A.  Data  set 

A  set  of  140  temporal  pairs  of  mammograms  containing 
biopsy-proven  masses  on  the  current  mammograms  was  used 
to  examine  the  performance  of  this  approach.  The  data  set 
consisted  of  241  mammograms  from  56  patients.  The  mam¬ 
mograms  were  digitized  with  a  LUMISCAN  85  laser  scanner 
at  a  pixel  resolution  of  50  /4m X  50  /xm  and  4096  gray  levels. 
The  digitizer  was  calibrated  so  that  gray  level  values  were 
linearly  proportional  to  the  optical  density  (OD)  within  the 
range  of  0-4  OD  units,  wiA  a  slope  of  0.001  OD/pixel 
value.  The  digitizer  output  was  linearly  converted  so  that  a 
large  pixel  value  corresponded  to  a  low  optical  density.  The 
image  matrix  size  was  reduced  by  averaging  every  2X2  ad¬ 
jacent  pixels  and  downsampled  by  a  factor  of  2,  resulting  in 
images  with  a  pixel  size  of  100/xmX  lOO/xm  for  further 
analysis. 

There  were  57  biopsy-proven  masses  (33  malignant  and 
24  benign)  in  the  56  cases.  The  241  mammograms  contained 
different  mammographic  views  (CC,  MLO,  and  lateral 
views)  and  multiple  examinations  of  the  masses  including 
the  examination  when  the  biopsy  decision  was  made.  By 
matching  masses  of  the  same  view  from  two  different  exami¬ 
nations,  a  total  of  140  temporal  pairs  were  formed,  of  which 
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(C)  (d) 


Fig.  2.  A  malignant  mass:  (a)  the  mass  in  a  prior  year  mammogram  (1997), 
(b)  mass  outline  obtained  by  active  contour  segmentation,  (c)  the  mass  in  a 
current  year  mammogram  (1998),  (d)  mass  outline  obtained  by  active  con¬ 
tour  segmentation. 


85  were  malignant  and  55  benign.  A  malignant  temporal  pair 
consisted  of  a  biopsy-proven  malignant  mass  or  a  mass  that 
was  initially  not  recommended  for  biopsy  and  later  found  to 
be  malignant  by  biopsy  in  a  future  year.  A  similar  definition 
was  used  for  the  benign  temporal  pairs.  Within  a  pair,  the 
current  mammogram  was  defined  as  the  mammogram  with 
the  later  date,  and  the  prior  mammogram  was  defined  as  the 
one  with  the  earlier  date.  Therefore,  in  cases  with  three  con¬ 
secutive  exams,  more  than  one  temporal  pair  could  be 
formed  and  two  of  the  mammograms  could  be  called  “cur¬ 
rent.”  Among  the  140  temporal  pairs,  we  had  120  unique 
current  mammograms.  Of  the  masses  in  the  120  current 
mammograms,  70  were  malignant  and  50  benign. 

Since  all  cases  in  this  data  set  had  undergone  biopsy,  the 
benign  masses  in  this  set  could  not  be  distinguished  easily 
from  the  malignant  ones  based  on  current  mammographic 
criteria.  Changes  occurred  for  the  benign  masses  that 
prompted  the  radiologists  to  recommend  biopsy.  Examples  of 
such  cases  are  shown  in  Figs.  2  and  3.  The  malignant  mass  in 
Fig.  2  did  not  increase  in  size  but  changed  its  density.  The 
benign  mass  (Fig.  3),  on  the  other  hand,  appeared  to  have 
spicules.  For  the  malignant  masses  in  this  data  set,  the  aver¬ 
age  mass  size,  estimated  by  the  radiologist  as  the  longest 
dimension  of  the  mass  on  the  mammogram,  was  8.2  mm  on 
the  prior  mammograms  and  12.7  mm  on  the  current  mam¬ 
mograms.  The  corresponding  sizes  were  10.6  and  12.2  mm, 
respectively,  for  the  benign  masses.  As  discussed  in  Sec.  IV, 
25  of  the  masses  on  the  prior  mammograms  were  too  subtle 
for  the  radiologist  to  estimate  their  sizes.  The  average  sizes 
given  previously  were  obtained  after  excluding  all  temporal 
pairs  that  involved  these  masses. 

The  radiologist  also  rated  the  visibility  of  the  masses  on 


(a)  (b) 


(c)  (d) 

Fig.  3.  A  benign  mass:  (a)  the  mass  on  a  prior  year  mammogram  (1995),  (b) 
mass  outline  obtained  by  active  contour  segmentation,  (c)  the  mass  on  a 
current  year  mammogram  (1996),  (d)  mass  outline  obtained  by  active  con¬ 
tour  segmentation. 

the  mammograms  relative  to  those  encountered  in  clinical 
practice  on  a  10-point  scale,  with  1  representing  the  most 
obvious  and  10  representing  the  most  subtle  masses.  The 
visibility  of  the  masses  on  the  current  mammogram  is  plotted 
against  those  on  the  prior  mammogram  in  Fig.  4  for  the 
malignant  and  benign  temporal  pairs.  Generally  the  malig¬ 
nant  masses  were  less  visible  on  the  prior  than  on  the  current 
mammograms  while  the  visibility  of  the  benign  masses  was 
found  to  be  more  similar  on  the  current  and  prior  mammo¬ 
grams.  The  mean  difference  in  the  visibility  rating  between 
the  prior  and  the  current  mammograms  for  the  malignant 
masses  is  2.8  compared  to  1.2  for  the  benign  masses  (p 
=  0.0007  with  an  unpaired  t-test  between  the  malignant  and 
benign  masses).  The  correlation  coefficient  is  0.02  for  malig¬ 
nant  masses  [Fig.  4(a)]  and  0.37  for  benign  masses  [Fig. 
4(b)].  In  addition,  the  radiologist  also  estimated  the  likeli¬ 
hood  of  malignancy  of  the  current  masses  on  a  10-point  con¬ 
fidence  scale  (1 — definitely  benign  and  10 — definitely  malig¬ 
nant)  based  on  the  120  current  mammograms  alone  without 
comparison  with  the  prior  (Fig.  5).  The  temporal  pairs  had  a 
time  interval  of  6-36  months  (Fig.  6).  More  than  70%  of  the 
pairs  had  a  time  interval  of  12  months. 

B.  Feature  extraction 

A  rectangular  region  of  interest  (ROI)  was  defined  to  in¬ 
clude  the  radiologist-identified  mass  with  an  additional  sur¬ 
rounding  breast  tissue  region  of  at  least  40  pixels  wide  from 
any  point  of  the  mass  border.  A  fully  automated  method  was 
then  used  for  segmentation  of  the  mass  from  the  breast  tissue 
background  within  the  ROI.  The  masses  on  both  the  current 
and  the  prior  mammograms  were  automatically  segmented 
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Mass  Visibility  in  Cun«nt  Mammogram 


Mass  Visibility  in  Current  Mammogram 


(a) 


(b) 


Fig.  4.  Visibility  of  the  masses  on  the  current  mammogram  plotted  against 
those  on  the  prior  mammogram  for  (a)  malignant  and  (b)  benign  temporal 
pairs.  The  visibility  was  rated  on  a  10-point  discrete  scale  (1  =  most  obvious, 
10=  most  subtle).  Because  many  of  the  data  points  overlap,  we  indicate  the 
number  of  points  with  the  same  rating  by  a  number  next  to  the  symbol  (m  or 
b).  The  diagonal  line  on  the  graph  represents  the  cases  when  the  current  and 
the  prior  mass  sizes  are  identical.  The  dashed  lines  are  the  linear  regression 
lines  for  the  data  defined  by  y  =  0.038x  +  7.86  for  (a)  and  by  y  =  0.857x 
+  1.742  for  (b).  The  correlation  coefficient  for  malignant  masses  is  0.02  and 
for  benign  masses  is  0.37. 


within  the  ROI  using  a  two-dimensional  active  contour 
method  that  was  initialized  by  K-mean  clustering 

The  texture  features  used  in  this  study  were  calculated 
from  run-length  statistics  (RLS)  matrices?’  The  RLS  matri¬ 
ces  were  computed  from  the  images  obtained  by  the  rubber 
band  straightening  transform  (REST)?  The  REST  maps  a 
band  of  pixels  surrounding  the  mass  onto  the  Cartesian  plane 
(a  rectangular  region).  In  the  transformed  image,  the  mass 
border  appears  approximately  as  a  horizontal  edge,  and 
spiculations  appear  approximately  as  vertical  lines.  A  com¬ 
plete  description  of  the  REST  can  be  found  in  the  literature.^ 
RLS  texture  features  were  extracted  from  the  vertical  and 
horizontal  gradient  magnitude  images,  which  were  obtained 
by  filtering  the  REST  image  with  horizontally  or  vertically 
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Malignancy  Ranking 

Fig.  5.  The  distribution  of  the  malignancy  ranking  of  the  masses  in  the  120 
current  mammograms.  The  rating  was  performed  by  an  experienced  MQSA 
radiologist  (1:  definitely  benign,  10:  definitely  malignant). 


oriented  Sobel  filters  and  computing  the  absolute  gradient 
values  of  the  filtered  image.^  Five  texture  measures,  namely, 
short  run  emphasis  (SRE),  long  run  emphasis  (LRE),  gray 
level  nonuniformity  (GLN),  run  length  nonuniformity 
(RLN),  and  run  percentage  (RP)  were  extracted  from  the 
vertical  and  horizontal  gradient  images  in  two  directions,  6 
=  0°,  and  6=90^.  Therefore,  a  total  of  20  RLS  features 
were  calculated  for  each  ROI,  The  definition  of  the  RLS 
feature  measures  can  be  found  in  the  Appendix  and  in  the 
literature.^’ 

Morphological  features  were  extracted  from  the  automati¬ 
cally  segmented  mass  shape.  Five  of  the  morphological  fea¬ 
tures  were  based  on  the  normalized  radial  length  (NRL),  de¬ 
fined  as  the  Euclidean  distance  from  the  object’s  centroid  to 
each  of  its  edge  pixels,  i.e.,  the  radial  length,  and  normalized 
relative  to  the  maximum  radial  length  for  the  object.^^  The 
following  five  NRL  features  were  extracted:  mean 
(NRLAVG),  standard  deviation  (NRLSD),  entropy  (NR- 
LENT),  area  ratio  (NRL ARE AR),  zero  crossing  count  (NR- 
LZCC).  In  addition,  the  perimeter  (PERIM),  area  (AREA), 
circularity  (CIRC),  rectangularity  (SQR),  contrast  (CONT), 
perimeter-to-area  ratio  (CRR),  and  Fourier  descriptor  (FF) 


Temporal  difference  (month) 


Fig.  6.  Temporal  interval  between  the  current  and  the  prior  mammograms 
for  the  140  temporal  pairs  in  our  data  set. 
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features  were  extracted.  The  definitions  of  the  morphological 
features  can  be  found  in  the  literature. Three  of  the  mor¬ 
phological  features  (perimeter,  area,  and  perimeter-to-area 
ratio)  are  related  to  the  mass  size  and  thus  are  feature  de¬ 
scriptors  of  the  mass  size. 

A  spiculation  measure  was  defined  for  each  pixel  on  the 
mass  border  by  using  the  statistics  of  the  image  gradient 
direction  relative  to  the  normal  direction  to  the  mass  border. 
The  statistics  was  determined  in  a  90°  sector  centered  about 
the  normal  at  the  border  pixel  and  outside  of  the  mass 
border.^^’^^  The  spiculation  measure  for  each  border  pixel 
was  normalized  to  be  between  0  and  7r/2,  with  a  value  of  ttM 
indicating  a  random  orientation  of  image  gradients,  and 
larger  values  indicating  a  higher  likelihood  of  spiculation. 
Three  features  were  extracted  from  the  spiculation  measure. 
The  first  feature  (AVG)  was  the  average  of  the  spiculation 
measure  for  all  pixels  on  the  mass  boundary.  The  second 
feature  (PERC_ABV)  was  the  percentage  of  border  pixels 
with  a  spiculation  measure  larger  than  ttM,  and  the  third 
feature  (AVE_ABV)  was  the  average  of  the  spiculation  mea¬ 
sure  for  those  pixels  with  a  spiculation  measure  larger  than 
ttM. 

A  total  of  35  features  (20  RLS,  12  morphological,  and  3 
spiculation)  were  therefore  extracted  from  each  ROL  Addi¬ 
tionally,  difference  features  were  obtained  by  subtracting  a 
prior  feature  from  the  corresponding  current  feature.  There¬ 
fore,  35  dilference  features  were  derived  from  the  20  RLS, 
12  morphological,  and  3  spiculation  features. 

C.  Feature  selection 

In  order  to  reduce  the  number  of  the  features  and  to  obtain 
the  best  feature  subset  to  design  an  effective  classifier,  fea¬ 
ture  selection  with  stepwise  linear  discriminant  analysis^^ 
was  applied.  At  each  step  of  the  stepwise  selection  procedure 
one  feature  is  entered  or  removed  from  the  feature  pool  by 
analyzing  its  effect  on  the  selection  criterion.  In  this  study, 
the  Wilks’  lambda  (the  ratio  of  within-group  sum  of  squares 
to  the  total  sum  of  squares^®)  was  used  as  a  selection  crite¬ 
rion.  The  optimization  procedure  used  a  threshold  for 
feature  entry,  a  threshold  for  feature  removal,  and  a 
tolerance  threshold  T  for  measuring  feature  correlation  with 
the  other  features.  In  a  feature  entry  step,  the  features  not  yet 
selected  are  entered  into  the  selected  feature  pool  one  at  a 
time,  the  significance  of  the  change  in  the  Wilks’  lambda 
caused  by  this  feature  is  estimated  based  on  F  statistics.  The 
feature  with  the  highest  significance  is  entered  into  the  fea¬ 
ture  pool  if  its  significance  is  higher  than  Fin  ^*td  its  corre¬ 
lation  value  with  the  rest  of  the  features  in  the  pool  is  below 
T.  In  a  feature  removal  step,  the  features  that  have  already 
been  entered  in  the  selected  feature  pool  are  removed  one  at 
a  time  and  the  significance  of  the  change  in  the  Wilks’ 
lambda  is  estimated.  The  feature  with  the  least  significance  is 
removed  from  the  selected  feature  pool  if  the  significance  is 
less  than  Fo^ .  Since  the  appropriate  values  of  Fin »  ^out 
T  are  not  known  a  priori,  we  examined  a  range  of  Fin »  ^out » 
and  T  values  using  an  automated  simplex  optimization 
method.^  The  appropriate  thresholds  were  chosen  in  such 
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Table  I.  Classification  results  for  the  classifier  based  on  the  temporal 
change  information,  the  classifier  based  on  current  single  image  information, 
and  the  classifier  based  on  prior  single  image  information. 


Avg.  No.  of  selected 

Classification  features  Training 

Test 

Test  partial 

^(0.9) 

Temporal  pairs 

10 

0.92 

0.88±0.03 

0.37±0.10 

Current  images 

11 

0.90 

0.82±0.04 

0.32±0.08 

Prior  images 

4 

0.78 

0.76±0.04 

p 

1+ 

o 

b 

oo 

a  way  that  a  minimum  number  of  features  were  selected  to 
achieve  a  high  accuracy  of  classification  by  LDA.  More  de¬ 
tails  about  the  stepwise  linear  discriminant  analysis  and  its 
application  to  CAD  can  be  found  elsewhere.^’^ 

The  feature  selection  in  this  study  was  performed  by  ap¬ 
plying  the  stepwise  feature  selection  to  the  entire  feature 
space  (combination  of  texture,  spiculation,  and  morphologi¬ 
cal  features  altogether)  as  well  as  subspaces  obtained  by  dif¬ 
ferent  combinations  of  the  three  feature  subspaces:  texture, 
spiculation,  and  morphological  features.  The  stepwise  feature 
selection  uses  a  sequential  forward  inclusion  and  backward 
elimination  approach.  The  procedure  does  not  exhaustively 
evaluate  all  possible  combinations  of  individual  features.  It  is 
therefore  not  optimal,  especially  when  the  feature  space  is  . 
large  and  the  training  sample  is  small.  By  limiting  the  input 
to  the  feature  subspaces,  the  dimensionality  was  reduced 
compared  to  the  entire  feature  space.  We  found  that  better 
feature  subsets  could  be  selected  by  the  stepwise  feature  se¬ 
lection  in  the  subspaces  than  in  the  entire  feature  space. 

D.  Evaluation  methods 

To  evaluate  the  classifier  performance,  the  training  and 
test  discriminant  scores  were  analyzed  using  receiver  operat¬ 
ing  characteristic  (ROC)  methodology.^^  The  discriminant 
scores  of  the  malignant  and  benign  masses  were  used  as 
decision  variables  in  the  LABROCl  program,^"*  which  fits  a 
binormal  ROC  curve  based  on  maximum  likelihood  estima¬ 
tion.  The  classification  accuracy  was  evaluated  as  the  area 
under  the  ROC  curve,  A  ^ .  The  performances  of  the  classifi¬ 
ers  were  also  assessed  by  estimating  the  partial  area  index 
(A^^'^^).  The  partial  area  index  (A^®'^^)  is  defined  as  the  area 
that  lies  under  the  ROC  curve  but  above  a  sensitivity  thresh¬ 
old  of  0.9  (TPFo=0.9)  normalized  to  the  total  area  above 
TPFo  (I-TPFq).  The  partial  indicates  the  perfor¬ 

mance  of  the  classifier  in  the  high  sensitivity  (low  false  nega¬ 
tive)  region  which  is  most  important  for  a  cancer  detection 
task. 

III.  RESULTS 

The  performances  of  the  classifiers  based  on  the  temporal 
pairs,  the  current  images,  and  the  prior  images  are  summa¬ 
rized  in  Table  I.  The  classifiers  that  achieved  the  highest  test 
A  2  values  with  a  small  average  number  of  features  were  pre¬ 
sented  here.  Table  II  is  a  summary  of  the  features  selected  for 
each  classifier.  For  the  56  training  subsets  of  temporal  pairs 
used  in  this  study,  an  average  of  10  features  were  selected  for 
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Table  n.  Selected  features  for  classifiers  based  on  temporal  pairs,  current  images,  and  prior  images.  The  letter 
“H”  or  “V”  at  the  beginning  of  the  texture  feature  labels  indicates  that  the  features  were  extracted  from  the 
horizontal  or  vertical  gradient  magnitude  images,  respectively.  The  number  (0  or  90)  at  the  end  of  the  texture 
feature  labels  shows  the  direction  at  which  the  features  were  extracted. 


Feature  type 

Group 

Features 

Temporal  pairs 

Current 

images 

Curr 

Prior 

images 

Pr 

Curr  Pr 

Diff 

Texture 

SRE 

H  SRE  0 

X 

H  SRE  90 

X 

X 

V  SRE_0 

X 

X 

X 

X 

V  SRE  90 

X 

LRE 

V  LRE  0 

X 

X 

H  LRE  0 

X 

RLN 

V  RLN_0 

X 

X 

RP 

H_RP_0 

X 

X 

Spiculation 

PERC_ABV 

X 

X 

AVG 

X 

AVG_ABV 

X 

Morphological 

CRR 

X 

NRLZCC 

X 

PERIM 

X 

NRLAVG 

X 

SQR 

X 

CONT 

X 

the  classification  task.  The  most  frequently  selected  features 
included  4  difference  RLS  features  (3  SRE  and  1  LRE),  4 
RLS  features  (2  SRE,  1  RLN  and  1  RP),  1  spiculation  feature 
from  the  current  image,  and  1  spiculation  feature  from  the 
prior  image  (Table  n).  The  LDA  classifier  achieved  an  aver¬ 
age  training  of  0.92  and  a  test  A^  of  0.88.  The  test  partial 
Af  was  0.37. 

For  classification  of  malignant  and  benign  masses  using 
the  current  single  images  (the  current  images  of  the  temporal 
pairs),  the  LDA  classifier  selected  an  average  of  1 1  features 
for  the  56  training  subsets.  The  most  frequently  selected  fea¬ 
tures  were  4  RLS  features  (2  SRE,  1  LRE  and  1  RLN),  1 
spiculation  feature,  and  6  morphological  features  (Table  It). 
The  classifier  achieved  an  average  training  A^  of  0.90,  a  test 
A^  of  0.82,  and  a  test  partial  of  0.32. 

For  the  classification  of  masses  based  on  the  prior  single 
images  alone,  an  average  of  4  features  were  selected  for  the 
56  training  subsets.  The  most  frequently  selected  features 
were  3  RLS  features  (1  SRE,  1  LRE,  and  1  RP)  and  1  spicu¬ 
lation  feature.  The  LDA  classifier  achieved  an  average  train¬ 
ing  A^  of  0.78,  test  A^  of  0.76,  and  test  partial  of  0.24. 

The  test  ROC  curves  for  the  three  classifiers  are  compared 
in  Fig.  7.  The  difference  in  the  test  A^  between  the  classifier 
based  on  the  temporal  pairs  and  that  based  on  the  current 
images  alone  is  statistically  significant  (p  =  0.015).  The  dif¬ 
ference  in  the  test  A^  between  the  classifier  based  on  the 
temporal  pairs  and  that  based  on  the  prior  images  alone  is 
also  statistically  significant  (p  =  0.001).  The  partial  area  in¬ 
dex  for  the  classifier  based  on  the  temporal  pairs  is  also 
improved  compared  to  the  classifiers  based  on  the  current  or 
the  prior  images  alone,  although  the  differences  did  not 
achieve  statistical  significance. 


IV.  DISCUSSION 

Texture  and  spiculation  features  were  important  for  ma¬ 
lignant  and  benign  classification  of  mammographic  masses 
for  all  three  types  of  classifiers:  the  classifier  based  on  tem¬ 
poral  pair  information,  the  classifier  based  on  current  image 
information,  and  the  classifier  based  on  prior  image  informa¬ 
tion.  One  or  more  of  the  spiculation  features  were  always 
selected  in  all  training  partitions  for  all  three  classifiers.  The 
most  frequently  selected  texture  features  were  the  short  run 
emphasis  (SRE)  features.  They  comprised  more  than  50%  of 
the  texture  features  selected  for  the  three  classifiers  (Table 

The  temporal-information-based  classifier  showed  im¬ 
proved  performance  compared  to  the  classifiers  based  on  cur- 


False  Positive  Fraction 

Fig.  7.  The  test  ROC  curves  for  the  classifiers  based  on  temporal  pair 
information,  current  image  information,  and  prior  image  information. 
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rent  or  prior  image  information  alone.  The  input  feature 
space  to  the  temporal-information-based  classifiers  included 
the  current,  prior,  and  difference  features.  This  allows  the 
classifier  to  choose  the  individual  features  or  the  difference 
features.  Using  the  stepwise  feature  selection  procedure  and 
the  linear  discriminant  classifier,  it  was  found  that  the  texture 
and  the  spiculation  features  contained  useful  temporal  infor¬ 
mation  to  perform  malignant  and  benign  mass  classification. 
Texture  features  appeared  to  provide  the  best  information  by 
the  difference  features  obtained  from  subtracting  the  prior 
from  the  corresponding  current  features  (SRE  and  LRE  dif¬ 
ference  features).  On  the  other  hand,  the  best  use  of  the 
spiculation  features  appeared  to  be  a  direct  combination  of 
current  and  prior  features  in  the  input  feature  vector  by  the 
LDA  since  the  individual  features  were  chosen. 

We  found  that  better  feature  subsets  could  be  selected  by 
the  stepwise  feature  selection  in  the  subspaces  than  in  the 
entire  feature  space.  For  example,  for  the  temporal- 
information-based  classifier,  a  better  feature  subset  with  a 
higher  test  at  0.88  was  found  when  the  input  feature  space 
included  only  the  texture  and  spiculation  subspaces.  The  ad¬ 
dition  of  the  morphological  feature  subspace  to  the  input 
feature  space  reduced  the  highest  test  to  0.84.  Similarly, 
in  the  case  of  the  classifier  based  on  prior  image  information, 
a  better  feature  subset  was  obtained  when  the  texture  and 
spiculation  feature  subspaces  were  used  in  the  input  feature 
space  for  stepwise  feature  selection.  Again  the  addition  of 
the  morphological  feature  subspace  to  the  input  feature  space 
reduced  the  highest  test  A^  to  0.72.  The  classifier  based  on 
current  image  information  was  the  only  one,  among  the 
three,  that  obtained  a  better  result,  as  shown  in  Table  I,  when 
the  morphological  feature  subspace  was  included  in  the  input 
feature  space. 

One  reason  for  the  poor  performance  of  the  morphologi¬ 
cal  features  may  be  due  to  the  fact  that  the  masses  were  more 
subtle  in  the  prior  images.  In  fact,  the  experienced  MQSA 
mammographer  was  not  confident  in  seeing  25  of  the 
“masses”  on  the  prior  images  and  could  not  provide  a  mass 
size  estimation  for  them.  Although  the  active  contour  model 
would  stop  the  iteration  based  on  the  preset  criteria  and 
found  an  “outline”  of  the  masses  on  the  prior  mammograms, 
generally  these  mass  outlines  were  less  reliable  than  those  on 
the  current  masses  in  providing  morphological  characteris¬ 
tics  of  the  masses.  Texture  features  did  not  depend  as 
strongly  on  the  precise  mass  boundary  as  morphological  fea¬ 
tures.  Three  out  of  the  four  features  selected  for  classification 
of  the  malignant  and  benign  masses  on  the  prior  images  were 
RLS  texture  features.  A  spiculation  feature  was  also  found  to 
be  a  good  discriminator. 

We  also  performed  ROC  analysis  of  the  malignancy  con¬ 
fidence  ratings  provided  by  the  experienced  MQSA  radiolo¬ 
gist  for  the  current  image  data  set  (120  images).  The  distri¬ 
bution  of  the  malignancy  ratings  is  shown  in  Fig.  5,  which 
resulted  in  an  A  ^  value  of  0.80  ±0.04.  This  indicates  that  the 
masses  in  the  current  mammograms  cannot  be  easily  distin¬ 
guished  as  malignant  or  benign  even  by  an  experienced  ra¬ 
diologist,  consistent  with  the  fact  that  all  lesions  had  indeed 
undergone  biopsy.  The  classifier  based  on  the  current  image 


information  has  an  A^  value  of  0.82 ±0.04,  similar  to  the 
accuracy  of  the  radiologist  for  this  data  set. 

In  this  study,  the  locations  of  the  masses  were  identified 
manually  on  both  the  current  and  the  prior  mammograms  by 
a  radiologist.  This  simulated  the  situation  when  a  radiologist 
finds  a  mass  either  in  a  diagnostic  or  a  screening  setting  and 
call  upon  the  CAD  algorithm  to  seek  a  second  opinion  on  the 
likelihood  of  malignancy  of  the  mass  based  on  the  interval 
change  information.  We  are  developing  an  automated  re¬ 
gional  registration  technique  that  can  automatically  locate 
the  mass  on  the  prior  mammogram  based  on  its  location  on 
the  current  mammogram.  The  location  of  the  mass  on  the 
current  mammogram  can  be  identified  by  a  radiologist  or  by 
an  automated  mass  detection  algorithm.  In  the  latter  case,  the 
process  of  mass  detection,  current  and  prior  mass  registra¬ 
tion,  and  classification  can  be  fully  automated.  The  analysis 
of  interval  change  can  be  incorporated  as  one  of  the  func¬ 
tions  provided  by  a  CAD  system  for  interpretation  of  mam¬ 
mograms. 

In  this  study,  we  employed  a  simple  measure  of  temporal 
change  by  taking  the  difference  between  the  feature  from  the 
current  mass  and  the  corresponding  feature  from  the  prior 
mass.  We  observed  improvement  in  classification  with  this 
simple  temporal  information.  It  will  be  important  to  evaluate 
other  similarity  measures  that  can  characterize  small  differ¬ 
ence  in  image  features  of  the  object  of  interest.  It  can  be 
expected  that  a  more  sensitive  similarity  measure  will  pro¬ 
vide  a  better  measurement  of  dissimilarity,  or  difference,  be¬ 
tween  the  current  and  prior  masses  and  further  improve  the 
utilization  of  the  temporal  change  information  on  mammo¬ 
grams. 


V.  CONCLUSION 

We  performed  a  preliminary  study  to  evaluate  the  effec¬ 
tiveness  of  interval  change  analysis  for  classification  of  ma¬ 
lignant  and  benign  masses  on  mammograms.  It  was  found 
that  the  difference  RLS  texture  features  and  spiculation  fea¬ 
tures  were  useful  for  identification  of  malignancy  in  tempo¬ 
ral  pairs  of  mammograms.  The  information  on  the  prior  im¬ 
age  was  important  for  characterization  of  the  masses;  5  out 
of  the  10  selected  features  contained  prior  information.  We 
found  that  the  mass  size  descriptors  were  not  discriminatory 
features  for  these  difficult  cases  because  many  of  the  benign 
masses  also  grew  over  time.  In  comparison  with  the  classi¬ 
fication  based  on  image  information  from  the  current  images 
alone,  the  temporal  change  information  significantly  (/? 
=  0.015)  improved  the  accuracy  for  classification  of  the 
masses  in  terms  of  the  total  area  under  the  ROC  curve  (A^). 
The  partial  area  under  the  ROC  curve  for  the  classifier  based 
on  the  temporal  pairs  (A^^'^^  =  0.37)  is  also  improved  com¬ 
pared  to  the  classifier  based  only  on  the  current  images 
(A  (0-9)  _o  32),  although  the  difference  did  not  achieve  statis¬ 
tical  significance.  Further  studies  are  under  way  to  improve 
this  temporal  change  classification  technique  and  to  evaluate 
its  performance  on  a  larger  data  set. 


Medical  Physics,  Vol.  28,  No.  11,  November  2001 


PROOF  COPY  01  HUMPH 


PROOF  COPY  011111MPH 

8  Hadjiiskl  et  al.:  Analysis  of  temporal  changes  of  mammographic  features 


ACKNOWLEDGMENTS 

This  work  is  supported  by  a  Career  Development  Award 
from  the  USAMRMC  (No.  DAMD  17-98-1-8211)  (L.H.), 
USPHS  Grant  No.  CA48129,  and  a  USAMRMC  grant  (No. 
DAMD  17-96-1-6254).  The  content  of  this  publication  does 
not  necessarily  reflect  the  position  of  the  government  and  no 
official  endorsement  of  any  equipment  and  product  of  any 
companies  mentioned  in  the  publication  should  be  inferred. 
The  authors  are  grateful  to  Charles  E.  Metz,  Ph.D.,  for  the 
LABROC  program. 


APPENDIX:  RUN  LENGTH  STATISTICS  TEXTURE 
FEATURES 

A  gray  level  run  length  is  a  set  of  consecutive  collinear 
pixels  all  having  the  same  gray  level  value.  The  length  of  the 
run  is  the  number  of  pixels  in  the  run.  For  a  given  image  it  is 
possible  to  compute  a  gray  level  run  length  matrix  for  runs  in 
any  given  direction.  In  this  study,  two  directions  are  used: 
^=0°,  and  ^=90°.  Let  p(ij)  be  the  number  of  times  there 
is  a  run  of  length  j  that  has  a  gray  level  i.  Let  Ng  be  the 
number  of  gray  levels  and  Nf.  be  the  number  of  runs.  The 
short  run  emphasis  is  defined  as 


yjy  yN, 


SRE-  w  ^  7^* 

This  feature  divides  the  frequency  of  each  run  length  by 
the  length  of  the  run  squared.  This  tends  to  emphasize  short 
runs.  The  denominator  is  the  total  number  of  runs  in  the 
image  and  serves  as  a  normalizing  factor.  The  long  run  em¬ 
phasis  is  defined  as 

LRE=  '  V  .■■■ 


This  feature  multiplies  the  frequency  of  each  run  length  by 
the  length  of  the  run  squared.  This  tends  to  emphasize  long 
runs. 

The  gray  level  nonuniformity  is  defined  as 


GLN= 


This  feature  squares  the  number  of  run  lengths  for  each  gray 
level.  This  measures  the  gray  level  nonuniformity  of  the  im¬ 
age.  If  the  runs  are  equally  distributed  over  all  gray  levels, 
the  feature  takes  on  its  lowest  values.  A  larger  run  length 
contributes  more  to  the  feature  value. 

Run  length  nonuniformity  is  defined  as 


RLN- 


This  feature  measures  the  nonuniformity  of  the  run  lengths. 
If  the  runs  are  equally  distributed  over  all  lengths,  the  feature 
will  have  a  low  value.  A  larger  run  contour  contributes  more 
to  the  feature  value. 
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Run  percentage  is  defined  as 


RP= 


This  feature  is  a  ratio  of  the  total  number  of  runs  to  the  total 
number  of  possible  runs  (P)  if  all  runs  have  a  length  of  one. 
The  above-given  definitions  are  based  on  Galloway^^  and 
more  details  can  be  found  in  this  reference. 
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Analysis  of  interval  change  is  important  for  mammographic  interpretation.  The  aim  of  this  study  is 
to  evaluate  the  use  of  an  automated  registration  technique  for  computer-aided  interval  change 
analysis  in  mammography.  Previously  we  developed  a  regional  registration  technique  for  identify¬ 
ing  masses  on  temporal  pairs  of  mammograms.  In  the  current  study,  we  improved  lesion  registra¬ 
tion  by  including  a  local  alignment  step.  Initially,  the  lesion  position  on  the  prior  mammogram  was 
estimated  based  on  the  breast  geometry.  An  initial  fan-shaped  search  region  was  then  defined  on  the 
prior  mammogram.  In  the  second  stage,  the  location  of  the  fan-shaped  region  on  the  prior  mam¬ 
mogram  was  refined  by  warping,  based  on  an  affine  transformation  and  simplex  optimization  in  a 
local  region.  In  the  third  stage,  a  search  for  the  best  match  between  the  lesion  template  from  the 
current  mammogram  and  a  structure  on  the  prior  mammogram  was  carried  out  within  the  search 
region.  This  technique  was  evaluated  on  124  temporal  pairs  of  mammograms  containing  biopsy- 
proven  masses.  Eighty-seven  percent  of  the  estimated  lesion  locations  resulted  in  an  area  overlap  of 
at  least  50%  with  the  true  lesion  locations  and  an  average  distance  of  2.4±2.1  mm  between  then- 
centroids.  The  average  distance  between  the  estimated  and  the  true  centroid  of  the  lesions  on  the 
prior  mammogram  over  all  124  temporal  pairs  was  4.2 ±5. 7  mm.  The  registration  accuracy  was 
improved  in  comparison  with  our  previous  study  that  used  a  data  set  of  74  temporal  pairs  of 
mammograms.  This  improvement  in  accuracy  resulted  from  the  improved  geometry  estimation  and 
the  local  affine  transformation.  ©  2001  American  Association  of  Physicists  in  Medicine. 

[DOI:  10.1118/1.1376134] 

Key  words:  mammography,  interval  change,  computer-aided  diagnosis,  breast  cancer,  affine 
transformation 


I.  INTRODUCTION 

Mammography  is  currently  the  most  effective  method  for 
early  breast  cancer  detection.^’^  One  of  the  important  tech¬ 
niques  used  by  radiologists  in  mammographic  interpretation 
to  detect  developing  malignancy  is  analysis  of  interval 
changes.^’"^  A  variety  of  computer-aided  diagnosis  (CAD) 
techniques  have  been  developed  to  detect  mammographic 
abnormalities  and  to  distinguish  between  malignant  and  be¬ 
nign  lesions.  We  are  studying  the  use  of  CAD  techniques  to 
assist  radiologists  in  interval  change  analysis. 

Sallam  et  al.^  have  proposed  a  warping  technique  for 
mammogram  registration  based  on  manually  identified  con¬ 
trol  points.  A  mapping  function  was  calculated  for  mapping 
each  point  on  the  current  mammogram  to  a  point  on  the  prior 
mammogram.  Brzakovic  et  al^  have  investigated  a  three- 
step  method  for  comparison  of  the  most  recent  and  the  prior 
mammograms.  They  first  registered  two  mammograms  using 
the  method  of  principal  axis,  and  partitioned  the  current 
mammogram  using  a  hierarchical  region-growing  technique. 
Translation,  rotation,  and  scaling  were  then  used  for  registra¬ 
tion  of  the  partitioned  regions.  Vujovic  et  al?  have  proposed 
a  multiple-control-point  technique  for  mammogram  registra¬ 
tion,  They  first  determined  several  control  points  indepen¬ 
dently  on  the  current  and  prior  mammograms  based  on  the 


intersection  points  of  prominent  anatomical  structures  in  the 
breast.  A  correspondence  between  these  control  points  was 
established  based  on  a  search  in  a  local  neighborhood  around 
the  control  point  of  interest. 

The  previous  techniques  depend  on  the  identification  of 
control  points.  However,  because  the  breast  is  mainly  com¬ 
posed  of  soft  tissue  that  can  change  over  time,  there  are  no 
obvious  landmarks  on  mammograms.  The  crossing  line 
structures  are  often  fibrous  tissue  from  different  depths  of  the 
breast  which  overlap  in  a  projection  image.  These  crossing 
points  are  not  invariant  landmarks  on  different  mammo¬ 
grams.  Because  of  the  elasticity  of  the  breast  tissue,  there  is 
large  variability  in  the  positioning  and  compression  used  in 
mammographic  examination.  As  a  result,  the  relative  posi¬ 
tions  of  the  breast  tissues  projected  onto  a  mammogram  vaiy 
from  one  examination  to  the  other.  Techniques  that  depend 
on  identification  of  control  points  may  not  be  generally  ap¬ 
plicable  to  registration  of  breast  images. 

Gopal  et  al.^~^^  and  Hadjiiski  et  al^^  have  developed  a 
multistage  technique  that  defines  the  transformation  to  lo¬ 
cally  map  the  position  of  the  mass  on  a  current  mammogram 
to  that  of  the  prior  mammogram.  A  local  search  for  the  mass 
is  then  performed  on  the  prior  mammogram.  Good  et  aO^ 
also  have  developed  a  technique  that  defines  a  transforma- 
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Current  Mammogram  Prior  Mammogram 


Identification  of  corresponding  lesion 

Fig.  1.  Block  diagram  of  the  regional  registration  technique. 


tion  to  map  all  points  from  the  current  mammogram  onto  a 
prior  mammogram.  The  current  mammogram  is  then  sub¬ 
tracted  from  the  prior  mammogram. 

The  goal  of  our  research  is  to  develop  a  technique  for 
computerized  analysis  of  temporal  differences  between  a 
mass  on  the  most  recent  mammogram  and  a  prior  mammo¬ 
gram  of  the  same  view.  The  computer  algorithm  will  assist 
radiologists  in  quantifying  interval  changes  and  thus  distin¬ 
guishing  between  benign  and  malignant  masses  for  CAD. 
When  fully  developed,  the  technique  will  be  applied  to  a 
mass  on  the  current  mammogram  either  identified  by  the 
radiologist  or  by  an  automated  mass  detection  program,  thus 
the  interval  change  analysis  can  be  an  integrated  part  of  an 
automated  CAD  system.  In  this  study,  we  focused  on  the 
development  of  an  automated  registration  technique  that  lo¬ 
calizes  the  corresponding  mass  on  the  prior  mammogram 
when  the  mass  on  the  current  mammogram  is  known.  There¬ 
fore,  we  used  radiologist-identified  mass  location  on  the  cur¬ 
rent  mammogram  as  a  starting  point  and  that  on  the  prior 
mammogram  as  the  ground  truth  for  evaluation  of  the  regis¬ 
tration  technique.  A  local  registration  technique  was  devel¬ 
oped  based  on  an  affine  transformation  and  simplex  optimi¬ 
zation  and  its  usefulness  in  improving  the  localization  of  the 
mass  on  the  prior  mammogram  was  investigated. 


II.  REGISTRATION  TECHNIQUE 

A  multistage  regional  registration  technique  was  devel¬ 
oped  for  identifying  corresponding  masses  on  temporal  pairs 
of  mammograms.  The  block  diagram  of  the  regional  regis¬ 
tration  technique  is  shown  in  Fig.  1.  In  the  first  stage,  an 
initial  fan-shaped  search  region  was  defined  on  the  prior 
mammogram  based  on  the  mass  location  on  the  current 
mammogram.  In  the  second  local  alignment  stage,  the  loca¬ 
tion  of  the  search  region  on  the  prior  mammograms  was  first 
refined  by  maximizing  a  correlation  measure  between  a  tem¬ 
plate  of  the  fan-shaped  region  centered  at  the  mass  extracted 
from  the  current  mammogram  and  the  breast  structures  on 
the  prior  mammogram.  The  affine  transformation  in  combi¬ 
nation  with  simplex  optimization  was  then  employed  to  warp 
this  local  region  and  further  improve  the  correlation.  In  the 
final  stage,  a  search  for  the  best  match  between  the  lesion 
template  from  the  current  mammogram  and  a  structure  on 
the  prior  mammogram  was  carried  out  within  the  refined 


Fig.  2.  An  example  of  a  pair  of  current  and  prior  mediolateral  oblique 
mammograms  in  our  data  set.  The  arrows  point  to  the  masses  on  the  current 
and  the  prior  mammograms.  The  white  lines  represent  the  breast  boundary 
determined  by  the  automated  boundary  detection  procedure. 


search  region.  A  more  detailed  explanation  for  each  of  the 
stages  will  be  presented  in  the  following  subsections. 

A.  Stage  1— Initial  estimate  of  search  region 

We  have  modified  our  previous  method  to  define  a  fan¬ 
shaped  search  region  on  the  prior  mammogram.  Initially  an 
automated  procedure  is  used  to  detect  the  breast  boundary  on 
the  mammograms  (Fig.  2).  The  location  of  the  mass  on  the 
current  mammogram  is  determined  in  a  polar  coordinate  sys¬ 
tem  with  the  nipple  as  the  origin.  By  using  the  radial  distance 
7?cuit  between  the  nipple  and  mass  centroid,  |NM|,  an  arc  is 
drawn  which  intersects  the  breast  boundary  at  points  A  and 
B  (Fig.  3).  Three  angles  are  estimated  at  the  radial  distance 
/?curr :  The  angle  p  between  NM  and  NA,  the  angle  <p  be¬ 
tween  NM  and  NB,  and  the  angle  6  between  NA  and  NB 
{d=p+(p).  The  location  of  the  mass  is  determined  by  /?curr 
and  the  angle  p  or  cp.  The  angle  d  is  the  breast  width  at  the 
radial  distance  .  Using  the  radial  distance  /?curr  lo  draw 
an  arc  centered  at  the  nipple  centroid  on  the  prior  mammo¬ 
gram,  N',  the  two  intersect  points  A'  and  B'  with  the  breast 
boundary  on  the  prior  mammogram  are  determined.  The 


Current  Prior 


Fig.  3.  Initial  estimation  of  the  mass  location  on  the  prior  mammogram, 
based  on  the  nipple-mass  centroid  distance  and  an  angular  distance  from  the 
breast  periphery  on  the  current  mammogram. 
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Fig.  4.  Definition  of  an  initial  fan-shaped  search  region  on  the  prior  mam¬ 
mogram  and  a  fan-shaped  template  on  the  current  mammogram. 


angle  6p  between  the  axes  |N'A'|  and  |N'B'|  is  estimated. 
An  angular  scaling  factor  a  can  be  calculated  as  the  ratio  of 
the  prior  and  the  current  angles,  a- Op  1 6. 

In  order  to  predict  the  angular  location  of  the  mass  on  the 
prior  mammogram,  the  smaller  angle  between  ^  and  is 
selected  as  the  angular  coordinate  of  the  mass  on  the  current 
mammogram.  The  smaller  angle  is  used  because  we  found 
by  experiment  that  it  produces  a  smaller  angular  deviation 
error  than  using  the  larger  angle.  The  angular  deviation  error 
is  defined  as  the  angle  between  the  axis  connecting  the 
nipple  and  the  true  mass  centroid  and  the  axis  connecting  the 
nipple  and  the  predicted  mass  centroid  on  the  prior  mammo¬ 
gram.  The  selected  angle,  multiplied  by  the  angular  scaling 
factor  a,  is  used  as  the  predicted  angle  from  the  correspond¬ 
ing  axis  on  the  prior  mammogram.  The  radial  distance  /?curr 
is  used  to  predict  the  radial  position  of  the  mass  on  the  prior 
mammogram. 

An  initial  fan-shaped  search  region  is  then  defined  on  the 
prior  mammogram  centered  at  the  predicted  location  of  the 
mass  centroid  (Fig.  4).  The  size  of  the  fan-shaped  region  is 
estimated  previously^®  to  have  the  form  f=^i  +  A:2/^curr 
S=k2,  where  26  determines  the  angular  width  and  2S deter¬ 
mines  the  radial  length  of  the  fan-shaped  region.  The  con¬ 
stants  kiyk2y  and  A:3  were  chosen  experimentally  such  that 
the  estimated  fan-shaped  regions  will  essentially  include  all 
mass  centroids  on  the  prior  mammograms.  A  fan-shaped 
template  centered  at  the  mass  is  also  defined  on  the  current 
mammogram.  More  details  on  defining  the  fan-shaped  region 
can  be  found  in  Appendix  A  and  in  Ref.  10. 

B.  Stage  2 — Refinement  of  search  region  by  warping 
and  alignment 

The  second  stage  combined  two  procedures.  First,  the  lo¬ 
cation  of  the  search  region  on  the  prior  mammograms  was 
refined  by  maximizing  a  correlation  measure  between  the 
fan-shaped  template  extracted  from  the  current  mammogram 
and  the  breast  structures  on  the  prior  mammogram.  The  tem¬ 
plate  was  shifted  pixel  by  pixel  within  the  initial  fan-shaped 
search  region  and  a  correlation  measure  was  calculated  at 
each  pixel  location.  The  pixel  location  providing  the  maxi- 
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Fig.  5.  The  fan-shaped  template  (.x,)?)  and  the  warped  fan-shaped  template 
(;c ',>>')  by  the  affine  transformation. 


mum  correlation  is  used  as  the  center  of  a  refined  search 
region.  This  is  basically  a  template  matching  operation.  Sec¬ 
ond,  the  affine  transformation  in  combination  with  simplex 
optimization  was  iteratively  used  to  warp  the  fan-shaped 
template  and  further  maximize  the  correlation  measure  with 
the  breast  structures  on  the  prior  mammogram. 

1.  Affine  transformation 

An  affine  transformation^^  is  a  linear  transformation  com¬ 
bining  scaling,  rotation,  and  translation.  A  two-dimensional 
affine  transformation  is  defined  as  follows: 

x'  —ax  +  by-\-c, 

y'  =  dx+ey+f, 

where  (jc,y)  are  the  original  coordinates,  are  the 

transformed  coordinates,  and  a,  b,  d,  e,  c,  f  are  the  transfor¬ 
mation  coefficients.  The  coefficients  a,  b,  d,  e  determine  a 
scaling  and  a  rotation,  and  the  coefficients  c  and  /determine 
a  translation.  The  result  of  applying  the  affine  transformation 
of  Eq.  (1)  in  combination  with  the  simplex  optimization  (de¬ 
scribed  below)  to  refine  the  fan-shaped  search  region  is 
shown  in  Fig.  5.  Since  the  affine  transformation  is  linear,  the 
transformed  object  is  linearly  resized  and  rotated.  This  can 
be  observed  from  the  edges  of  the  bounding  box  of  the  fan¬ 
shaped  region  (white  box  in  Fig.  5).  After  the  transformation 
the  edges  are  still  straight  lines,  however,  the  comer  angles 
are  different  from  90  degrees  and  the  lengths  of  the  lines  are 
linearly  scaled. 

2.  Noniinear  simpiex  optimization 

The  nonlinear  simplex  optimization  by  Nelder  and 
Mead^"^’^^  is  used  to  adjust  the  coefficients  a,  b,  c,  d,  e, 
and /and  to  warp  the  fan-shaped  template,  thereby  maximiz¬ 
ing  the  correlation  between  the  template  and  a  breast  struc¬ 
ture  on  the  prior  mammogram.  This  optimization  defines  a 
hyper-polygon.  For  each  vertex  an  error  function  is  calcu¬ 
lated.  The  polygon  is  then  ‘‘rolled”  towards  the  minimum. 
The  movement  of  the  polygon  (towards  the  minimum)  is 
obtained  by  reflection  in  the  direction  opposite  to  the  vertex 
with  the  maximal  error.  Figure  5  shows  the  result  of  appli¬ 
cation  of  the  affine  transformation  whose  coefficients  were 
obtained  by  the  nonlinear  simplex  optimization.  A  more  de¬ 
tailed  discussion  on  this  optimization  method  can  be  found  in 
Appendix  B  and  Refs.  14  and  15. 
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Fig.  6.  A  refined  search  region  was  defined  on  the  prior  mammogram.  A 
search  for  the  best  match  between  the  mass  template  from  the  current  mam¬ 
mogram  and  a  stmcture  on  the  prior  mammogram  was  carried  out  within  the 
refined  search  region.  (A — mass  template  on  current  mammogram, 
B — warped  fan-shaped  region  from  current  mammogram,  C — refined  search 
region). 

3.  Stage  3 — Mass  template  matching  and 
localization  of  corresponding  lesion 

At  this  stage  a  new  search  region  with  a  reduced  size  is 
defined  on  the  prior  mammogram  (Fig.  6).  The  reduced  size 
of  the  search  region  is  determined  experimentally  by  itera¬ 
tive  adjustment  of  the  size  of  the  rectangular  region  targeting 
the  improvement  of  the  final  result.  A  template  containing 
the  mass  is  extracted  from  the  current  mammogram.  The 
mass  location  on  the  prior  mammogram  is  then  determined 
by  maximizing  the  correlation  between  the  template  and  a 
structure  within  the  search  region  (Fig.  7). 

III.  DATA  SET 

A  set  of  124  temporal  pairs  of  mammograms  containing 
biopsy-proven  masses  on  the  current  mammograms  was  used 
to  examine  the  performance  of  this  approach.  Different 
mammographic  views  of  the  same  breast  were  also  included. 
There  were  a  total  of  221  mammograms  obtained  from  54 
cases.  Temporal  pairs  were  formed  using  the  temporal  se- 


Fig.  7.  Final  identification  of  the  corresponding  mass  on  the  prior  mammo¬ 
gram.  (A — Mass  template  on  current  mammogram,  B — Refined  search  re¬ 
gion,  C — Identified  mass  location). 


quence  from  the  corresponding  view.  Some  cases  contained 
mammograms  of  multiple  years  and  a  combination  of  the 
mammograms  from  different  prior  years  with  the  current- 
year  mammogram  formed  multiple  temporal  pairs.  Thirty 
five  of  the  mammograms  were  digitized  with  a  LUMISYS 
DIS-1000  laser  scanner  at  a  pixel  resolution  of  100  pmX  100 
fjm  and  4096  gray  levels.  The  digitizer  was  calibrated  so  that 
gray  level  values  were  linearly  proportional  to  the  optical 
density  (OD)  within  the  range  of  0.1-2.8  OD  units,  with  a 
slope  of  0.001  OD/pixel  value.  Outside  this  range,  the  slope 
of  the  calibration  curve  decreased  gradually.  The  OD  range 
of  the  digitizer  was  0-3.5.  The  remaining  186  mammograms 
were  digitized  with  a  LUMISCAN  85  laser  scanner  at  a  pixel 
size  of  50  ^mX50  fim  and  4096  gray  levels.  The  digitizer 
was  calibrated  so  that  the  gray  level  values  were  linearly 
proportional  to  the  OD  within  the  range  of  0-4  OD  units, 
also  with  a  slope  of  0.001  OD/pixel  value.  Output  from  both 
digitizers  was  linearly  converted  so  that  large  pixel  value 
corresponded  to  a  low-optical  density.  In  order  to  process  the 
mammograms  digitized  with  these  two  different  digitizers, 
the  images  were  first  averaged  using  a  filter  that  has  constant 
weights  over  the  entire  filter  kernel  and  then  were  down- 
sampled.  This  filter  will  be  referred  to  as  a  box  filter.  The 
images  digitized  with  the  LUMISCAN  85  digitizer  were  av¬ 
eraged  with  a  16X 16  box  filter  and  then  were  down-sampled 
by  a  factor  of  16.  The  images  digitized  with  the  LUMISYS 
DIS-1000  digitizer  were  averaged  with  an  8X8  box  filter  and 
then  were  down-sampled  by  a  factor  of  8.  Therefore,  all  re¬ 
sulting  images  had  a  pixel  size  of  800  /imX800  fjim. 

The  54  cases  contained  53  biopsy  proven  and  one 
follow-up  masses.  The  221  mammograms  contained  differ¬ 
ent  mammographic  views  and  multiple  years  of  the  masses 
including  the  year  when  the  biopsy  was  performed.  Of  the 
124  temporal  pairs  of  mammograms  73  were  malignant  and 
51  benign.  A  malignant  temporal  pair  consists  of  a  biopsy 
proven  malignant  mass  or  a  mass  that  was  followed  up  and 
was  found  to  be  malignant  when  a  biopsy  was  performed  in 
a  future  year.  Of  the  124  temporal  pairs  of  mammograms,  63 
were  CC-view  pairs,  48  were  MLO-view  pairs,  and  13  were 
lateral-view  pairs.  A  Mammography  Quality  Standards  Act 
(MQSA)-approved  radiologist  read  the  original  mammogram 
to  identify  the  mass  and  provide  description  of  its  character¬ 
istics.  The  radiologist  defined  a  bounding  box  around  the 
mass  and  marked  the  nipple  location  on  every  film. 

The  radiologist  also  measured  the  mass  sizes,  defined  as 
the  longest  dimension  of  the  mass,  both  on  the  current  and 
prior  mammograms.  In  Figs.  8(a)  and  8(b)  the  mass  sizes  on 
the  current  mammograms  were  plotted  against  those  on  the 
prior  mammograms  for  the  malignant  and  the  benign  tempo¬ 
ral  pairs,  respectively.  Only  103  temporal  pairs  were  plotted 
(54  malignant  and  49  benign)  due  to  the  fact  that  the  masses 
on  the  prior  mammograms  in  the  remaining  21  temporal 
pairs  were  too  subtle  for  the  radiologist  to  estimate  their 
boundaries.  On  average  the  malignant  masses  appear  to  have 
a  larger  increase  in  size  than  the  benign  masses.  The  mean 
increase  in  size  from  prior  to  current  for  the  malignant 
masses  is  4.2  mm  compared  to  1 .6  mm  for  the  benign  masses 
(p =0.008).  The  correlation  coefficient  is  0.71  for  the  malig- 
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(a) 


Mass  Size  in  Current  mammogram  (mm) 

(b) 

Fig.  8.  Mass  sizes  measured  by  an  MQSA-approved  radiologist  on  the  cur¬ 
rent  mammograms  plotted  against  those  on  the  prior  mammograms  for  (a) 
54  malignant  and  (b)  49  benign  temporal  pairs.  The  diagonal  line  on  the 
graph  represents  the  case  when  the  current  and  the  prior  mass  sizes  are 
identical.  The  dashed  lines  are  the  linear  regression  lines  defined  by  y 
=  0.469ji:  + 3.012  for  (a)  and  by  y  =  0.638j:+ 3.242  for  (b).  The  correlation 
coefficient  for  malignant  masses  is  0.71  and  for  benign  masses  is  0.83. 


nant  masses  and  0.83  for  the  benign  masses  [Fig.  8(a)  and 
8(b)]. 

The  radiologist  also  rated  the  visibility  of  the  masses  on 
the  mammograms  relative  to  those  encountered  in  clinical 
practice  on  a  10-point  scale,  with  one  represents  the  most 
obvious  and  10  the  subtlest  masses.  The  visibility  of  the 
masses  on  the  current  mammogram  is  plotted  against  those 
on  the  prior  mammogram  in  Fig.  9  for  the  73  malignant  and 
51  benign  temporal  pairs.  Generally,  the  malignant  masses 
were  less  visible  on  the  prior  mammograms  while  the  vis¬ 
ibility  of  the  benign  masses  was  found  to  be  more  similar. 
The  mean  difference  in  visibility  between  the  prior  and  the 
current  mammograms  for  the  malignant  masses  is  2.8  com¬ 
pared  to  0.7  mm  for  the  benign  masses  (p=0.0002).  The 
correlation  coefficient  is  0.06  for  malignant  masses  and  0.54 
for  benign  masses  [Figs.  9(a)  and  9(b)].  For  most  of  the 


Mass  Visibility  in  Current  Mammogram 


Mass  Visibility  In  Current  Mammogram 

(b) 


Fig.  9.  Visibility  of  the  masses  on  the  current  mammogram  plotted  against 
those  on  the  prior  mammogram  for  (a)  malignant  and  (b)  benign  temporal 
pairs.  The  visibility  was  rated  on  a  10-point  discrete  scale  (1  =most  obvious, 
10=subtlest).  Because  many  of  the  data  points  overlap,  we  indicate  the 
number  of  points  with  the  same  rating  by  a  number  next  to  the  symbol  {nt  or 
b).  The  diagonal  line  on  the  graph  represents  the  case  when  the  current  and 
the  prior  mass  sizes  are  identical.  The  dashed  lines  are  the  linear  regression 
lines  defined  by  y  =  0.055jc  +  7.44  for  (a)  and  by  y  =  0.658x  + 2.138  for  (b). 
The  correlation  coefficient  for  malignant  masses  is  0.06  and  for  benign 
masses  is  0.54. 


temporal  pairs  the  time  interval  between  the  current  and  the 
prior  mammogram  was  12  months  (Fig.  10). 


IV.  EVALUATION  METHODS 

The  accuracy  of  the  multistage  regional  registration  was 
analyzed  in  terms  of  two  measures.  The  first  measure  is  the 
overlap  area  between  the  estimated  and  the  true  lesions  on 
the  prior  mammogram.  The  fractions  of  registered  temporal 
pairs  that  could  provide  an  accuracy  of  over  50%  area  over¬ 
lap  and  over  75%  area  overlap  were  examined.  The  second 
measure  is  the  average  Euclidean  distance  between  the  cen¬ 
troids  of  the  estimated  and  the  true  lesion  locations. 
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Temporal  difference  (month) 


Fig.  10.  Temporal  interval  between  the  current  and  the  prior  mammograms 
for  the  124  temporal  pairs  in  our  data  set. 

V.  REGISTRATION  RESULTS 

A.  Stage  1— Initial  estimate  of  search  region 

At  this  stage  an  initial  estimation  of  the  mass  location  on 
the  prior  mammogram  was  carried  out  based  on  the  geo¬ 
metrical  position  of  the  mass  on  the  current  mammogram. 
Based  on  observation  of  the  radial  deviation  errors  and  the 
angular  deviation  errors,  the  fan-shaped  search  region  was 
estimated  to  be  0.25+ 5//? cun-  radians  and  <5=20  mm.  This 
definition  of  the  fan-shaped  search  region  resulted  in  an  av¬ 
erage  search  area  of  1462  mm^  on  the  prior  mammograms. 
For  the  124  temporal  image  pairs  used  in  this  study,  the 
Euclidean  distance  between  the  initial  estimate  of  the  cen¬ 
troid  location  of  the  corresponding  structure  on  the  prior 
mammogram  and  the  center  of  the  bounding  box  of  the  mass 
provided  by  the  radiologist  was  estimated.  For  the  124  tem¬ 
poral  image  pairs,  the  average  Euclidean  distance  error  of  the 
initial  estimate  was  8.4±5.4  mm.  The  error  distributions  for 
both  the  malignant  and  the  benign  pairs  are  shown  in  Fig.  1 1 . 
At  this  initial  stage,  57%  of  the  estimated  lesion  locations 
resulted  in  an  area  overlap  of  at  least  50%  with  the  true 
lesion  locations  and  27%  resulted  in  an  area  overlap  of  at 
least  75%  (Fig,  12). 

B.  Stage  2 — Refinement  of  search  region  by  warping 
and  alignment 

At  the  second  stage,  the  location  of  the  search  region  on 
the  prior  mammogram  was  first  refined  by  maximizing  a 
correlation  measure  between  the  fan-shaped  template  ex¬ 
tracted  from  the  current  mammogram  and  the  breast  struc¬ 
tures  on  the  prior  mammogram.  The  affine  transformation  in 
combination  with  simplex  optimization  was  then  employed 
to  warp  this  local  region.  For  the  124  temporal  image  pairs, 
the  average  Euclidean  distance  error  after  the  second  stage 
was  7.5 ±5,4  mm.  At  this  stage,  59%  of  the  estimated  lesion 
locations  resulted  in  an  area  overlap  of  at  least  50%  with  the 
true  lesion  locations,  and  36%  resulted  in  an  area  overlap  of 
at  least  75%.  The  average  Euclidean  distance  error  at  this 


Centroids  Distance  [mm] 


Fig.  11.  Distribution  of  Euclidean  distance  error  between  the  initial  estimate 
of  the  mass  centroid  location  on  the  prior  mammogram  and  the  center  of  the 
bounding  box  of  the  mass  provided  by  the  radiologist  for  the  malignant  and 
benign  pairs  after  the  first  detection  stage. 

Stage  was  reduced  compared  to  that  of  the  first  stage,  how¬ 
ever,  it  did  not  achieve  statistical  significance  (p=0.07). 

After  the  simplex  optimization,  the  search  region  was  re¬ 
duced  to  a  constant  size  of  24  mmX24  mm  (=576  mm^) 
centered  at  the  refined  fan-shaped  region  for  every  prior 
mammogram. 

C.  Stage  3— Mass  template  matching  and  localization 
of  corresponding  lesion 

At  this  final  stage,  a  search  for  the  best  match  between  the 
lesion  template  from  the  current  mammogram  and  a  structure 
on  the  prior  mammogram  was  carried  out  within  the  refined 
search  region.  This  template  matching  resulted  in  87%  of  the 
estimated  lesion  locations  having  an  area  overlap  of  at  least 
50%  with  the  true  lesion  locations.  The  distributions  of  the 
Euclidean  error  for  the  malignant  and  the  benign  temporal 
pairs  are  shown  in  Fig.  13.  The  average  distance  between  the 
estimated  and  the  true  centroids  of  the  lesions  on  the  prior 
mammogram  for  all  124  pairs  was  4.2 ±5.7  mm  with  a  maxi¬ 
mum  of  31.6  mm.  These  results  are  summarized  in  Table  I. 
For  the  87%  of  the  temporal  pairs  with  50%  overlap,  the 
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Area  overlap  [%] 

Fig.  12.  Distribution  of  the  area  overlap  between  the  estimated  and  the  true 
lesion  locations  for  124  temporal  pairs  after  the  first  detection  stage. 
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Fig.  13.  Distribution  of  Euclidean  distance  enror  between  the  estimate  of  the 
mass  centroid  location  on  the  prior  mammogram  and  the  center  of  the 
bounding  box  of  the  mass  provided  by  the  radiologist  for  the  malignant  and 
benign  pairs  after  the  final  detection  stage. 


Area  overlap  [%] 


Fig.  14.  Distribution  of  the  area  overlap  between  the  estimated  and  the  true 
lesion  locations  for  124  temporal  pairs  after  the  final  detection  stage. 


average  distance  between  the  estimated  and  the  true  cen¬ 
troids  of  the  lesions  on  the  prior  mammogram  was  2.4±2.1 
mm  with  a  maximum  of  10.2  mm.  When  a  more  stringent 
criterion  of  75%  overlap  is  imposed,  82%  of  the  masses  on 
the  prior  mammograms  are  considered  to  be  localized  (Fig. 
14).  For  the  82%  of  the  temporal  pairs  with  75%  overlap,  the 
average  distance  between  the  estimated  and  the  true  cen¬ 
troids  of  the  lesions  on  the  prior  mammogram  was  2.2 ±1.9 
mm  with  a  maximum  of  10.2  mm.  The  average  Euclidean 
distance  error  at  this  stage  was  significantly  reduced  com¬ 
pared  to  the  error  of  the  first  stage  (p= 0.000  001)  and  the 
error  of  the  second  stage  (/?= 0.000  001). 

D.  Study  of  the  importance  of  the  stage  2  procedures 

The  effect  of  the  two  procedures  at  Stage  2  on  the  regis¬ 
tration  accuracy  was  studied.  We  removed  them  one  at  a 
time  and  evaluated  the  registration  results.  When  the  first 
correlation  procedure  was  removed,  the  average  Euclidean 
distance  error  increased  to  5. 6 ±8.2  mm  in  the  final  stage. 
Only  81%  of  the  estimated  lesion  locations  resulted  in  an 
area  overlap  of  at  least  50%  with  the  true  lesion  locations 
and  75%  resulted  in  an  area  overlap  of  at  least  75%  with  the 
true  lesion  locations.  When  the  second  warping  procedure 
was  removed,  the  average  Euclidean  distance  error  increased 
to  5.0 ±6.3  mm  in  the  final  stage.  Only  82%  of  the  estimated 


Table  I.  The  Euclidean  distance  between  the  true  and  the  estimated  cen¬ 
troids  of  the  mass  on  the  prior  mammogram  for  the  three  detection  stages. 


Overall 

50%  overlap 

75%  overlap 

Mean  distance 

8.4  mm 

5.6  mm 

4.5  mm 

Stage  1 

Standard.  Deviation. 

5.4  mm 

2.8  mm 

2.6  mm 

Max.  distance 

29.0  mm 

16.2  mm 

13.8  mm 

Mean  distance 

7.5  mm 

4,9  mm 

3.9  mm 

Stage  2 

Standard.  Deviation. 

5.4  mm 

3.0  mm 

2.6  mm 

Max.  distance 

32.0  mm 

16.9  mm 

11.6  mm 

Mean  distance 

4.2  mm 

2.4  mm 

2.2  mm 

Stage  3 

Standard.  Deviation 

5.7  mm 

2.1  mm 

1.9  mm 

Max.  distance 

31.6  mm 

10.2  mm 

10.2  mm 

lesion  locations  resulted  in  an  area  overlap  of  at  least  50% 
with  the  true  lesion  locations  and  76%  resulted  in  an  area 
overlap  of  at  least  75%  with  the  true  lesion  locations. 

VI.  DISCUSSION 

The  approach  proposed  here  has  simplified  the  first  stage 
compared  to  our  previous  method.^®  In  the  previous  method, 
the  distances  between  the  nipple  and  the  breast  centroid  on 
the  current  and  prior  mammograms  were  determined  and 
used  to  estimate  a  radial  scaling  factor.  The  angular  location 
of  the  mass  was  measured  from  the  nipple-breast  centroid 
axis,  A  global  alignment  procedure  was  used  for  determina¬ 
tion  of  the  breast  centroids.  With  our  new  approach  we 
eliminated  the  scaling  for  the  radial  distance  between  the 
nipple  and  the  mass  location  of  the  prior  mammogram.  The 
breast  periphery  was  used  as  a  reference  for  the  estimation  of 
the  angular  position  of  the  mass.  Therefore,  there  was  no 
need  to  determine  the  breast  centroids  on  the  current  and  the 
prior  mammograms  and  the  global  alignment  procedure 
could  be  eliminated.  This  is  possible  because  the  local  align¬ 
ment  step  provides  better  compensation  for  the  displacement 
of  the  corresponding  masses  on  the  current  and  the  prior 
mammogram  caused  by  different  compression  and  position¬ 
ing  of  the  breast. 

It  was  found  that  the  estimation  of  the  angular  position 
from  the  breast  periphery  allowed  more  precise  localization 
of  the  mass  position  on  the  prior  mammogram  compared  to 
our  previous  method  where  the  angular  position  of  the  mass 
was  estimated  based  on  the  nipple-breast  centroid  axis.^® 
There  is  a  large  variability  in  the  estimation  of  the  breast 
centroid  location  because  the  extend  of  the  breast  imaged  on 
the  mammogram  at  the  chest  wall  and  at  the  axillary  tail  in 
the  MLO  view  depends  on  the  breast  positioning  and  com¬ 
pression.  This  causes  an  uncertainty  in  defining  the  region  to 
calculate  the  breast  centroid.  In  the  previous  study  using  74 
temporal  pairs,  the  estimated  Euclidean  distance  error  at  the 
first  stage  was  9.8±6.0  mm.  The  fan-shaped  search  region 
was  defined  as  6=0.35 +5/r,  resulting  in  an  average  area  of 
1865  mm^  for  the  fan-shaped  search  region.  In  the  current 
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Fig.  15.  The  visibility  and  the  mass  size  of  nine  malignant  temporal  pairs  piG.  16.  The  visibility  and  the  mass  size  of  seven  benign  temporal  pairs 
having  area  overlap  less  than  50%.  The  radiologist  was  unable  to  define  the  having  area  overlap  less  than  50%. 
prior  mass  sizes  of  pairs  6  and  9  due  to  the  subtlety  of  these  masses. 


Study,  the  estimated  Euclidean  distance  error  at  the  first  stage 
was  reduced  to  8.4±5.4  mm  even  though  the  data  set  was 
increased  to  124  temporal  pairs  of  mammograms.  This  al¬ 
lows  the  fan-shaped  region  to  be  reduced  to  e=0.25+5/r, 
resulting  in  an  average  fan-shaped  search  area  of  1462  mm^ 
on  the  prior  images.  The  reduction  of  the  search  area  im¬ 
proves  the  chance  of  correctly  localizing  the  mass  on  the 
prior  mammogram. 

The  second  stage  combined  two  procedures:  First  the  lo¬ 
calization  of  the  search  region  on  the  prior  mammograms 
was  refined  by  maximizing  a  correlation  measure  between 
the  fan-shaped  template  extracted  from  the  current  mammo¬ 
gram  and  the  breast  structures  on  the  prior  mammogram.  The 
affine  transformation  in  combination  with  simplex  optimiza¬ 
tion  was  then  employed  to  warp  and  locally  align  the  tem¬ 
plate  with  the  breast  structures.  Both  procedures  improved 
the  detection  process.  When  one  of  these  procedures  was 
removed  the  registration  results  deteriorated,  as  discussed  in 
the  Results  section. 

With  these  improvements,  the  accuracy  of  the  current  re¬ 
gional  registration  technique  is  improved  over  the  previous 
method.^®  The  current  technique  produced  an  average  Eu¬ 
clidean  distance  error  of  4.2 ±5.7  mm,  compared  to  5. 4 ±7. 5 
mm  when  the  previous  technique  was  applied  to  the  current 
data  set.  This  difference  is  statistically  significant  (p=0.03). 
82%  of  the  estimated  lesion  locations  resulted  in  an  area 
overlap  of  at  least  75%  with  the  true  lesion  locations  com¬ 
pared  with  72%  when  applying  the  previous  technique  to  the 
current  data  set.  It  is  interesting  to  note  that,  of  the  21 


“masses”  on  the  prior  mammograms  that  the  experienced 
radiologist  could  not  confidently  define  the  mass  and  mea¬ 
sure  its  size,  our  registration  technique  localize  19  of  them 
with  an  area  overlap  greater  than  50%. 

The  average  distance  between  the  estimated  and  the  true 
centroid  of  the  lesions  on  the  prior  mammogram  for  the  sub¬ 
set  of  temporal  pairs  having  50%  overlap  is  about  half  of  that 
of  the  entire  data  set  (Table  I).  The  maximum  distance  for 
this  subset  is  about  1/3  of  that  for  the  entire  data  set. 

With  the  current  regional  registration  technique,  16  tem¬ 
poral  pairs  (13%  of  124  temporal  pairs)  have  an  area  overlap 
less  than  50%.  Twelve  of  the  16  computer  estimated  loca¬ 
tions  do  not  overlap  at  all  with  the  radiologist’s  identified 
locations,  and  the  other  four  pairs  have  an  overlap  between 
1%  and  49%.  Seven  of  them  are  benign  and  nine  are  malig¬ 
nant.  A  major  cause  of  the  misregistration  was  that  the  mass 
was  small  and  subtle  and  a  breast  structure  within  the  search 
region  had  a  higher  correlation  with  the  mass  template  from 
the  current  mammogram.  Figures  15  and  16  show  the  visibil¬ 
ity  ratings  and  sizes  of  these  misregistered  masses.  Eight  of 
the  nine  misregistered  malignant  masses  have  visibility  rat¬ 
ings  of  9  or  10  and  sizes  below  5  mm.  The  misregistered 
benign  masses  are  somewhat  more  obvious  and  larger  in 
sizes  than  the  malignant  ones.  Since  many  of  the  masses  on 
the  prior  mammograms  were  not  interpreted  as  a  mass  with¬ 
out  reference  to  the  current  mammograms,  the  automatic  reg¬ 
istration  with  template  matching  would  be  difficult  with 
these  masses  if  the  search  region  contains  normal,  but  dense 
breast  structures.  We  are  currently  investigating  the  applica¬ 
tion  of  local  mass  detection  in  the  search  region  to  focus 
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template  matching  to  a  few  suspicious  areas.  Morphological 
and  texture  features  will  be  extracted  from  the  potential  mass 
areas  to  provide  additional  matching  information  in  the  fea¬ 
ture  space. 

The  interval  change  analysis,  when  fully  developed,  will 
be  one  of  the  functions  provided  in  an  integrated  CAD  sys¬ 
tem.  The  mass  on  the  current  mammogram  can  be  detected 
by  an  automated  mass  detection  algorithm  or  identified  by  a 
radiologist.  The  CAD  system  will  then  analyze  whether  the 
mass  is  an  existing  or  a  newly  developed  lesion  and  will 
estimate  its  likelihood  of  malignancy.  We  are  developing 
methods  for  characterization  of  malignant  and  benign  masses 
based  on  analysis  of  interval  changes  in  the  mass  features. 
Investigation  of  criteria  to  determine  whether  a  mass  exists 
on  the  prior  mammogram  is  underway.  If  the  mass  is  a  newly 
developed  lesion  on  the  current  mammogram,  it  will  then 
undergo  a  single-exam  analysis  by  the  CAD  system. 

VII.  CONCLUSION 

We  are  developing  an  automated  registration  technique 
for  analysis  of  interval  change  of  a  mass  from  a  previous 
mammographic  exam  to  the  current  one.  In  this  study  we 
found  that  a  local  affine  transformation  in  combination  with 
nonlinear  simplex  optimization  can  improve  the  localization 
and  reduce  the  size  of  the  search  region.  With  the  improved 
method,  87%  of  the  estimated  lesion  locations  in  124  ran¬ 
domly  selected  temporal  pairs  resulted  in  an  area  overlap  of 
at  least  50%  with  the  true  lesion  locations.  When  the  thresh¬ 
old  for  correct  localization  was  set  to  75%  area  overlap,  82% 
of  the  temporal  pairs  still  exceeded  this  threshold.  The  aver¬ 
age  distance  between  the  estimated  and  the  true  centroids  of 
the  lesions  on  the  prior  mammogram  over  all  pairs  was  4.2 
±5.7  mm.  The  registration  accuracy  of  the  current  method 
has  been  improved  in  comparison  with  that  of  our  previous 
method^®  even  though  the  data  set  was  increased  from  74 
pairs  to  124  pairs.  This  improvement  is  obtained  mainly 
from  the  second  stage  affine  transformation  and  simplex  op¬ 
timization.  Additional  studies  are  currently  underway  to  de¬ 
velop  a  feature  matching  method  to  further  improve  lesion 
localization. 
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APPENDIX  A:  DEFINITION  OF  THE  FAN-SHAPED 
REGION  ON  THE  PRIOR  MAMMOGRAM 

Refer  to  Figs.  3  and  4,  the  fan-shaped  region  on  the  prior 
mammogram  is  drawn  based  on  the  nipple  centroid  on  the 
prior  mammogram,  N',  as  the  center  of  the  coordinate  sys¬ 


tem.  The  two  bounding  arcs  are  drawn  using  the  radial  dis¬ 
tances  Rcurr'b  ^  ^curr"  ^oth  Centered  at  N'.  The  two 
sides  of  the  fan-shaped  region  are  bounded  by  two  radial 
lines  that  form  angles  €  and  -e  with  the  line  |N'M'|.  Thus 
the  initial  fan-shaped  search  region  is  centered  as  the  pre¬ 
dicted  location  of  the  mass  centroid  M'  on  the  prior  mam¬ 
mogram  (Fig.4). 

The  constants  ki,  ki,  and  k^,  were  chosen  experimentally 
based  on  analysis  of  the  angular  deviation  errors  and  the 
corresponding  radial  deviation  errors  for  the  124  temporal 
pairs.  The  radial  deviation  error  is  defined  as  the  difference 
between  the  predicted  and  the  true  distance  of  the  mass  from 
the  nipple  on  the  prior  mammogram.  The  constants  ki,  ^2 
are  obtained  in  such  a  way  that  e  is  the  smallest  upper  bound 
that  can  enclose  all  angular  deviation  errors  for  all  radial 
distances  (/?cur)  temporal  pairs.  The  selection  of  the 

parametric  form  of  e  was  discussed  in  detail  in  Ref.  10.  It 
reduced  e  at  larger  R^ur  •  The  constant  ^3  was  chosen  to  be 
equal  to  the  maximum  radial  deviation  error. 

APPENDIX  B;  SIMPLEX  OPTIMIZATION 

An  optimization  problem  can  be  defined  as  an  error  func¬ 
tion  that  has  to  be  minimized  by  iterative  selection  of  the 
values  of  the  function  parameters  n.  We  can  define  n  +  l 
dimensional  space,  where  n  dimensions  (degree  of  freedom) 
correspond  to  the  error  function  parameters,  and  one  dimen¬ 
sion  is  the  error  function  itself.  When  the  optimization  func¬ 
tion  is  calculated  for  all  possible  values  of  the  n  parameters, 
and  error  surface  in  (n+ l)-dimensional  space  will  be  ob¬ 
tained.  Usually  the  error  functions  for  the  real  world  appli¬ 
cations  are  complex  and  nonlinear  and  the  corresponding 
error  surfaces  contain  local  minima. 

The  nonlinear  simplex  optimization  by  Nelder  and 
Mead^"^’^^  defines  a  hyper-polygon  with  n  + 1  vertexes  in  a 
(n  + 1 )  dimensional  space.  For  each  vertex  the  error  function 
is  calculated.  The  polygon  is  then  “rolled”  towards  the 
minimum.  The  movement  of  the  polygon  (towards  the  mini¬ 
mum)  is  obtained  by  reflection  in  the  direction  opposite  to 
the  vertex  (K)  with  the  maximal  error.  To  achieve  this  the 
center  of  masses  (L)  of  the  hyper-polygon  vertexes  is  calcu¬ 
lated.  A  line  KL  connects  the  center  of  the  masses  with  the 
vertex  with  the  maximal  error.  The  new  vertex  (K')  is  ob¬ 
tained  by  central  projection  of  the  vertex  K  on  the  line  KL 
with  center  L  and  |K'L|  =  r|KL|.  The  coefficient  t  deter¬ 
mines  how  far  the  new  vertex  will  be  projected  and  what  the 
corresponding  size  of  the  hyper-polygon  will  be.  The  larger 
the  hyper-polygon  is,  the  easier  it  will  avoid  (“roll  over”) 
the  local  minima  on  the  error  surface.  However,  it  will  be 
difficult  to  get  close  to  the  global  minimum  if  its  size  is  too 
large.  On  the  other  hand,  although  a  small  hyper-polygon 
will  allow  it  to  get  to  a  close  proximity  to  the  global  mini¬ 
mum,  it  is  more  likely  to  be  trapped  in  a  local  minimum.  The 
magnitude  of  the  coefficient  t  is  controlled  adaptively  by  the 
Nelder  and  Mead  algorithm.  In  case  a  large  reduction  in  the 
error  is  detected  for  the  new  vertex,  the  magnitude  of  t  is 
increased.  In  case  the  error  is  found  to  be  increased  for  the 
new  vertex,  the  magnitude  of  t  is  decreased. 
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The  this  paper,  the  nonlinear  simplex  optimization  by 
Nelder  and  Mead  was  used  to  adjust  the  coefficients  a,  b,  c, 
d,  e,  and  /  and  to  warp  the  fan-shaped  template,  thereby 
maximizing  the  correlation  (C)  between  the  template  and  a 
breast  structure  on  the  prior  mammogram.  Therefore,  the  di¬ 
mensionality  of  the  space  was  7:  Six  parameters  to  be  ad¬ 
justed  and  the  error  function  to  be  minimized  was  defined  as 
1-C. 
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Rationale  and  Objectives.  The  authors  performed  this  study  to  evaluate  the  effects  of  pixel  size  on  the  characterization 
of  mammographic  microcalcifications  by  radiologists. 

Materials  and  Methods.  Two-view  mammograms  of  112  microcalcification  clusters  were  digitized  with  a  laser  scanner  at 
a  pixel  size  of  35  fxm.  Images  with  pixel  sizes  of  70,  105,  and  140  /itm  were  derived  from  the  35-/xm-pixel  size  images 
by  averaging  neighboring  pixels.  The  malignancy  or  benignity  of  the  microcalcifications  had  been  determined  with  find¬ 
ings  at  biopsy  or  2-year  follow-up.  Region-of-interest  images  containing  the  microcalcifications  were  printed  with  a  laser 
imager.  Seven  radiologists  participated  in  a  receiver  operating  characteristic  (ROC)  study  to  estimate  the  likelihood  of  ma¬ 
lignancy.  The  classification  accuracy  was  quantified  with  the  area  under  the  ROC  curve  (A^).  The  statistical  significance  of 
the  differences  in  the  A^  values  for  different  pixel  sizes  was  estimated  with  the  Dorfman-Berbaum-Metz  method  and  the 
Student  paired  t  test.  The  variance  components  were  analyzed  with  a  bootstrap  method. 

Results.  The  higher-resolution  images  did  not  result  in  better  classification;  the  average  A^  with  a  pixel  size  of  35  ^m  was 
lower  than  that  with  pixel  sizes  of  70  and  105  /xm.  The  differences  in  A,  between  different  pixel  sizes  did  not  achieve 
statistical  significance. 

Conclusion.  Pixel  sizes  in  the  range  studied  do  not  have  a  strong  effect  on  radiologists’  accuracy  in  the  characterization 
of  microcalcifications.  The  low  specificity  of  the  image  features  of  microcalcifications  and  the  large  interobserver  and  in¬ 
traobserver  variabilities  may  have  prevented  small  advantages  in  image  resolution  from  being  observed. 

Key  Words.  Breast  neoplasms,  calcification;  breast  radiography,  comparative  studies;  breast  radiography,  technology;  re¬ 
ceiver  operating  characteristic  curve  (ROC). 
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Breast  cancer  is  one  of  the  leading  causes  of  death  in 
women  between  the  ages  of  40  and  55  years.  In  the 
United  States,  the  mortality  rate  for  breast  cancer  in 
women  is  the  second  highest  of  all  cancers,  and  breast 
cancer  was  estimated  to  account  for  16%  of  all  cancer 
deaths  in  1998  (1).  Studies  have  indicated  that  early  de¬ 
tection  and  treatment  improve  the  chances  of  survival  for 
breast  cancer  patients.  At  present,  mammography  is  the 
only  proven  method  that  consistently  demonstrates  mini¬ 
mal  breast  cancers  (2,3).  The  image  quality  with  conven¬ 
tional  mammography,  however,  is  limited  by  the  dynamic 
range  of  screen-film  systems.  The  contrast  sensitivity  of 
screen-film  mammograms  is  very  poor  in  the  overpen- 
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etrated  periphery  and  the  underpenetrated  dense  fibroglan- 
dular  tissue  regions  on  the  breast  image.  Recently,  a  digi¬ 
tal  mammography  system  has  received  U.S.  Food  and 
Drug  Administration  clearance  for  clinical  use.  Digital 
mammography  detectors  are  expected  to  provide  a  wider 
dynamic  range  than  screen-film  systems  and,  thus,  in¬ 
crease  the  contrast  sensitivity  in  the  periphery  and  dense 
regions  of  the  breast.  The  improved  image  quality  is  ex¬ 
pected  to  lead  to  an  improvement  in  the  accuracy  of 
breast  cancer  diagnosis. 

The  spatial  resolution  of  current  digital  detectors  is 
generally  lower  than  that  of  screen-film  systems.  Digital 
detectors  used  in  the  full-field  digital  mammography  sys¬ 
tems  that  are  commercially  available  or  under  develop¬ 
ment  have  pixel  sizes  in  the  range  of  40  X  40  /xm  to  100  X 
100  fxm,  which  correspond  to  nominal  spatial  resolution  of 
about  12  line  pairs  per  millimeter  to  5  line  pairs  per  milli¬ 
meter.  In  contrast,  the  spatial  resolution  of  mammographic 
screen-film  systems  generally  exceeds  20  line  pairs  per  milli¬ 
meter.  Higher-resolution  digital  detectors  require  smaller 
pixel  sizes.  The  development  of  digital  detectors  with  small 
pixel  sizes,  however,  is  not  only  technologically  demanding, 
but  the  requirements  for  image  transmission,  archiving,  and 
display  increase  rapidly  as  the  matrix  size  increases.  The 
trade-offs  between  spatial  resolution  and  cost  and  effi¬ 
ciency  are  important  considerations  in  the  development  of 
digital  mammography  systems.  The  maximum  pixel  size 
acceptable  for  performing  mammography  without  reduc¬ 
ing  the  detectability  of  subtle  breast  cancers  is  unknown. 

One  of  the  important  signs  of  breast  cancer  is  clustered 
microcalcifications  (4),  which  can  be  seen  on  mammo¬ 
grams  in  30%-50%  of  breast  cancers  (5-8).  Microcalcifi¬ 
cations  associated  with  early  breast  cancers  are  usually 
smaller  than  about  500  /xm.  Among  the  image  features 
that  may  indicate  the  presence  of  breast  cancer,  microcal¬ 
cifications  are  the  smallest.  Therefore,  the  spatial  resolu¬ 
tion  required  for  the  detection  and  characterization  of  sub¬ 
tle  microcalcifications  on  mammograms  may  be  regarded 
as  the  lower  bound  for  the  resolution  of  a  mammographic 
detector.  In  a  previous  receiver  operating  characteristic 
(ROC)  study  (9),  we  compared  the  detectability  of  subtle 
microcalcifications  on  original  screen-film  mammograms 
with  that  on  mammograms  digitized  at  a  pixel  size  of  100 
fim  with  an  optical  drum  scanner.  We  found  that  the  de¬ 
tection  accuracy  of  subtle  microcalcifications  decreased 
when  radiologists  read  the  digitized  images.  Although  the 
detection  accuracy  improved  after  the  digitized  images 
were  enhanced  with  unsharp  mask  filtering,  it  remained 
lower  than  that  with  the  original  screen-film  mammo¬ 


grams.  In  another  study  (10),  we  investigated  the  detect¬ 
ability  of  individual  microcalcifications  on  digitized  mam¬ 
mograms  by  using  a  computer  program.  Those  results 
also  indicated  a  reduction  in  detectability  when  the  digiti¬ 
zation  pixel  size  increased  from  35  to  140  /xm. 

Malignant  microcalcifications  may  exhibit  linear  and 
branching  shapes,  as  well  as  variations  in  shape  and  size 
within  a  cluster.  Benign  microcalcifications  tend  to  be 
round  and  smooth,  with  relatively  uniform  shapes  and 
sizes  within  a  cluster.  The  visibility  of  the  detailed  shapes 
is  dependent  on  the  spatial  resolution  of  the  image  record¬ 
ing  system.  Therefore,  it  is  generally  believed  that  a 
higher  spatial  resolution  is  required  to  differentiate  malig¬ 
nant  from  benign  microcalcifications  than  to  detect  micro¬ 
calcifications.  Results  of  some  recent  studies,  however, 
indicate  that  this  may  not  be  the  case.  Karssemeijer  et  al 
(11)  performed  an  ROC  study  to  compare  the  accuracy  of 
classifying  microcalcifications  on  original  screen-film 
mammograms  with  that  on  images  digitized  at  a  pixel 
size  of  100  /xm  and  viewed  on  a  display  monitor.  They 
found  that  there  was  no  statistically  significant  difference 
in  the  classification  accuracy  between  the  two  reading 
conditions.  Kaliergi  et  al  (12)  also  performed  an  ROC 
study  to  compare  the  detection  and  classification  of  clus¬ 
tered  microcalcifications  at  three  reading  conditions: 
screen-film  mammograms,  images  digitized  at  a  pixel  size 
of  105  /xm  and  displayed  on  a  monitor,  and  wavelet-en¬ 
hanced  digitized  images  displayed  on  a  monitor.  They 
found  that  the  detection  with  the  original  mammograms 
was  much  better  than  that  with  the  digitized  mammo¬ 
grams  displayed  on  a  monitor;  the  use  of  wavelet  en¬ 
hancement,  however,  reduced  the  difference.  The  charac¬ 
terization  of  microcalcifications  was  not  substantially  dif¬ 
ferent  among  the  three  reading  conditions. 

We  performed  this  ROC  study  to  evaluate  the  effects 
of  pixel  size  on  the  characterization  of  malignant  and  be¬ 
nign  microcalcifications  on  digitized  mammograms.  Two- 
view  mammograms  were  digitized  and  displayed  as  laser- 
printed  film  images  at  four  pixel  sizes  ranging  from  35  to 
140  /xm.  Seven  radiologists  experienced  in  mammography 
estimated  the  likelihood  of  malignancy.  The  dependence 
of  classification  accuracy  on  pixel  size  was  analyzed  with 
ROC  methodology. 


MATERIALS  AND  METHODS 


Data  Set 

Digital  mammograms  were  obtained  by  digitizing 
screen-film  mammograms  with  a  laser  film  scanner.  One 
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hundred  twelve  microcalcification  clusters  were  selected 
from  100  patient  cases  in  the  Breast  Imaging  Division  at 
the  University  of  Michigan  with  approval  from  the  Insti¬ 
tutional  Review  Board.  Two-view  mammograms  of  each 
cluster  were  digitized.  The  two  views  included  a  cramo- 
caudal  view  and  a  mediolateral  oblique  or  lateral  view. 

Forty  of  the  microcalcification  clusters  were  proved  at 
biopsy  to  be  malignant,  and  65  were  proved  at  biopsy  to 
be  benign.  The  other  seven  clusters  were  considered  to  be 
benign  based  on  findings  of  at  least  2  years  of  follow-up. 
Of  the  40  malignant  clusters,  25  were  ductal  carcinoma  in 
situ.  The  distribution  of  the  sizes  (the  longest  dimension) 
of  the  microcalcification  clusters  is  shown  in  Figure  1. 

The  longest  dimension  of  the  clusters  ranged  from  2.0  to 
18.0  mm  (mean,  6.4  mm).  Seven  of  the  benign  microcal¬ 
cifications  and  five  of  the  malignant  microcalcifications 
were  spread  over  an  area  larger  than  20  mm  in  diameter 
and,  thus,  were  considered  to  be  diffuse.  The  data  set  in¬ 
cluded  microcalcifications  with  a  range  of  subtleties.  The 
subtlety  of  the  microcalcifications  was  rated  by  a  radiolo¬ 
gist  experienced  in  mammography  (M.A.H.)  on  a  scale  of 
1  (obvious)  to  10  (subtle)  relative  to  the  visibility  range 
of  microcalcifications  encountered  in  clinical  practice. 

The  subtlety  ratings  are  shown  in  Figure  2.  The  malignant 
and  benign  microcalcifications  were  similarly  distributed, 
with  the  benign  microcalcifications  slightly  more  subtle 
than  the  malignant  clusters. 

All  mammograms  were  digitized  at  a  pixel  size  of 
35  X  35  fim  with  12-bit  gray  levels  by  using  a  laser 
scanner  (DIS-IOOO,  Lumisys,  Los  Altos,  Calif).  The  digi¬ 
tizer  had  an  optical  density  range  of  about  0  to  3.5.  It  was 
calibrated  such  that  the  optical  density  on  film  was  lin¬ 
early  proportional  to  the  pixel  value  at  0.001  optical  den¬ 
sity  units  per  pixel  value  in  the  optical  density  range  of 
about  0-2.8.  The  pixel  values  of  the  images  were  linearly 
inverted  so  that  large  pixel  values  represented  a  low  opti¬ 
cal  density.  The  resolution  of  the  scanner  was  evaluated 
by  digitizing  test  film  images  with  line  pair  patterns.  It 
was  found  that  line  pair  patterns  up  to  14.3  line  pairs  per 
millimeter  could  be  resolved  on  the  digitized  image  (10). 

A  1,024  X  1,024-pixel  region  of  interest  (ROI)  con¬ 
taining  the  microcalcifications  was  extracted  from  the  dig¬ 
itized  image.  Except  for  clusters  that  were  close  to  the 
chest  wall  or  in  the  breast  periphery,  the  extracted  cluster 
was  usually  centered  within  the  ROI.  Diffuse  microcalci¬ 
fications  that  were  larger  than  the  ROI  were  truncated  to 
the  size  of  the  ROI.  Microcalcification  images  digitized 
with  pixel  sizes  of  70,  105,  and  140  ptm  were  simulated 
from  the  image  with  the  35-p,m  pixel  size  by  averaging 
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Figure  1.  The  size  distribution  of  the  microcalcification  clusters. 
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SUBTLETY  RATING 

Figure  2.  Distribution  of  the  subtlety  ratings  for  the  microcalcifi¬ 
cation  clusters.  1  =  most  obvious,  10  =  most  subtle. 


2X2,  3X3,  and  4X4  neighboring  pixels,  respectively. 
Because  ROIs  of  different  pixel  sizes  were  derived  from 
the  same  digitized  image,  there  would  not  be  differences 
in  image  quality  caused  by  the  reproducibihty  of  digitiza¬ 
tion.  The  actual  size  of  all  ROIs  corresponded  to  an  area 
of  35.8  X  35.8  mm  on  the  original  mammograms,  regard¬ 
less  of  the  pixel  sizes. 

Because  the  use  of  display  monitors  to  view  images 
can  introduce  variables  that  may  be  difficult  to  control, 
we  printed  the  ROI  images  on  film  with  a  laser  imager 
(model  969HQ;  Imation,  Oakdale,  Minn)  for  the  observer 
performance  study.  To  reduce  the  effects  of  image  size  on 
characterization,  the  ROIs  with  the  three  larger  pixel  sizes 
(ie,  smaller  matrix  sizes  for  the  same  ROI  image)  were 
enlarged  to  the  same  printed  size  as  that  of  the  35-/LLm 
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EFFECTS  OF  PIXEL  SIZE 


Table  1 

Confidence  Rating  Scale 


Rating 

Likelihood  of 
Malignancy  (%) 

Suspicion  Level 

BI-RADS 

Category 

1 

0-2 

Benign,  probably  benign 

2,3 

2 

3-20 

Suspicious,  with  low 
probability  of  malignancy 

4 

3 

21-30 

Suspicious,  with  low 
probability  of  malignancy 

4 

4 

31-40 

Suspicious,  with  moderate 
probability  of  malignancy 

4 

5 

41-50 

Suspicious,  with  moderate 
probability  of  malignancy 

4 

6 

51-60 

Suspicious,  with  moderate 
probability  of  malignancy 

4 

7 

61-70 

Suspicious,  with  moderate 
probability  of  malignancy 

4 

8 

71-80 

Highly  suggestive  (high 
probability)  of  malignancy 

5 

9 

81-90 

Highly  suggestive  (high 
probability)  of  malignancy 

5 

10 

91-100 

Highly  suggestive  (high 
probability)  of  malignancy 

5 

pixel  size  images  by  means  of  interpolation.  Sixteen  inter¬ 
polation  schemes  were  available  from  the  laser  imager 
interface  software.  To  choose  the  best  interpolation 
scheme  for  this  study,  we  printed  an  image  of  a  cluster 
containing  microcalcifications  of  different  sizes  and 
shapes  at  pixel  sizes  of  70,  105,  and  140  ^m  by  using  the 
16  interpolation  schemes.  The  images  of  35-p,m  pixel  size 
were  also  printed.  A  radiologist  who  was  qualified  under 
the  requirements  of  the  Mammography  Quality  Standards 
Act  visually  compared  the  printed  images  and  numbered 
his  top  three  choices  for  each  set  of  images.  The  radiolo¬ 
gist  was  not  aware  of  the  specific  schemes.  After  the  de¬ 
cision  was  made,  he  informed  us  that  his  criteria  were  a 
balance  between  blockiness  and  blurriness  on  the  enlarged 
image  and  its  similarity  to  the  35-/xm  image.  The  experi¬ 
ment  was  repeated  two  times,  with  the  sessions  separated 
by  more  than  a  month.  The  top  two  choices  obtained 
from  the  two  readings  were  consistent.  The  top  two 
choices  were  essentially  indistinguishable  so  that  one  of 
them  was  used  to  print  the  images.  The  chosen  scheme 
was  a  convolution  interpolation  that  filled  the  interpolated 
pixels  with  smooth  weighted  gray  levels  of  the  adjacent 
pixels. 

The  printed  ROIs  measured  84  X  84  mm,  which  corre¬ 
sponded  to  a  pixel  pitch  of  about  82  fim  for  the  laser  im¬ 
ager.  The  printed  ROIs  were  therefore  magnified  by  a 


factor  of  about  2.3  compared  with  their  size  on  the  origi¬ 
nal  screen-film  mammograms.  Because  radiologists  rou¬ 
tinely  view  microcalcifications  with  a  magnifying  lens  or 
on  a  magnified  spot  mammogram,  however,  the  magnifi¬ 
cation  should  not  affect  the  classification  of  the  microcal¬ 
cifications.  To  maintain  the  same  displayed  contrast  for 
images  of  different  pixel  sizes,  the  four  ROIs  of  different 
pixel  sizes  were  printed  on  the  same  piece  of  film  and, 
thus,  developed  at  the  same  time.  This  minimized  the  ef¬ 
fects  of  any  potential  fluctuations  in  the  printer  calibration 
and  in  the  development  conditions  of  the  laser  film  on  the 
relative  density  and  contrast  of  the  printed  images. 

Observer  Performance  Study 

Seven  radiologists,  all  of  whom  were  qualified  under  the 
requirements  of  the  Mammography  Quality  Standards  Act 
to  read  and  routinely  interpret  mammograms,  participated 
as  observers.  The  radiologists  had  3-20  years  experience 
in  mammographic  interpretation.  Because  there  were  112 
ROIs  and  four  pixel  sizes  for  each  ROI,  a  total  of  448 
images  were  read  by  each  observer.  The  two  views  of 
each  cluster  at  the  same  pixel  size  were  read  side  by  side. 
The  observers  were  not  informed  of  the  prevalence  of 
malignant  cases  or  the  proportion  of  biopsy  cases.  Each 
observer  read  the  ROI  images  in  four  reading  sessions. 
Every  reading  session  was  separated  from  the  previous 
one  by  at  least  2  weeks.  In  each  session,  one-quarter  of 
the  images  of  each  pixel  size  were  read.  Each  case  ap¬ 
peared  once  and  only  once  in  each  session.  The  reading 
orders  of  the  images  in  each  pixel  size  were  counterbal¬ 
anced  such  that,  on  average,  no  images  of  a  given  pixel 
size  were  read  in  a  given  order  (eg,  read  first  by  the  ob¬ 
servers)  more  often  than  images  of  any  other  pixel  sizes. 
The  reading  order  of  the  images  was  randomized  differ¬ 
ently  for  each  observer.  This  systematic  randomization 
reading  scheme  minimized  any  potential  learning  effects 
on  the  reading  results  (13).  The  observers  were  allowed 
unlimited  reading  time. 

The  likelihood  that  the  microcalcifications  were  malig¬ 
nant  was  rated  with  a  10-point  confidence  rating  scale. 

The  confidence  rating  scale  was  designed  and  related  to 
the  Breast  Imaging  Reporting  and  Data  System  (BI-RADS) 
ratings  by  an  experienced  radiologist,  as  shown  in  Table 
1 .  A  likelihood  of  malignancy  of  less  than  2%  for  benign 
or  probably  benign  mammographic  abnormalities  was 
chosen  on  the  basis  of  the  studies  by  Sickles  (14,15).  The 
observers  also  rated  the  subtlety  of  each  case  according  to 
a  10-point  scale  (1  ==  most  obvious,  10  =  most  subtle) 
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on  the  basis  of  their  perception  of  the  cluster  relative  to 
their  experience  with  clinical  cases. 

A  table  showing  the  rating  scale  and  the  corresponding 
BI-RADS  category  was  available  to  the  observers  for  ref¬ 
erence  during  the  reading  sessions.  A  training  session  was 
conducted  before  each  reading  session  to  familiarize  the 
observers  with  the  rating  scales.  Three  malignant  and 
three  benign  clusters  not  included  in  the  test  set  were 
used  in  the  training  session.  After  the  rating  scales  were 
explained  to  the  observer,  he  or  she  rated  each  cluster  as 
described  earlier.  They  were  told  the  biopsy  outcome  of 
the  cluster  after  rating  each  training  case.  There  was  no 
“truth”  for  the  subtlety  rating.  The  subtlety  rating  was 
recorded  as  additional  information  about  each  radiolo¬ 
gist’s  subjective  impression  of  a  cluster. 

Analysis  of  Classification  Accuracy 

The  confidence  ratings  of  the  UkeUhood  of  malignancy 
were  analyzed  with  ROC  analysis  (13).  The  two  class 
distributions  were  assumed  to  be  binormal,  and  an  ROC 
curve  was  fitted  to  the  confidence  ratings  on  the  basis  of 
maximum  likehhood  estimation.  The  ROC  curve  repre¬ 
sents  the  relationship  between  the  true-positive  fraction 
(sensitivity)  and  the  false-positive  fraction  (1  -  specific¬ 
ity)  as  the  confidence  threshold  varies.  An  ROC  curve 
was  generated  for  each  observer  and  for  images  of  each 
pixel  size.  The  classification  accuracy  was  quantified  by 
using  the  area  under  the  ROC  curve  (A^).  The  average 
ROC  curve  for  each  reading  condition  was  derived  by 
averaging  the  slope  and  intercept  parameters  of  the  indi¬ 
vidual  observers’  fitted  ROC  curves.  The  statistical  signif¬ 
icance  of  the  differences  in  the  ROC  curves  for  two  pixel 
sizes  was  estimated  by  using  the  Dorfman-Berbaum-Metz 
(DBM)  method  for  multireader,  multicase  ROC  data  (16) 
and  the  Student  paired  t  test  for  the  observer-specific 
paired  A^  values.  The  paired  t  test  takes  into  account  the 
statistical  variation  of  the  readers,  whereas  the  DBM 
method  includes  both  the  reader  variation  and  case  sam¬ 
ple  variation  with  an  analysis-of-variance  approach. 
Therefore,  the  results  with  the  DBM  method  can  be  gen¬ 
eralized  to  the  population  of  readers  as  well  as  the  case 
samples.  In  addition,  the  bootstrap  method  developed  by 
Beiden  et  al  (17)  was  used  to  analyze  the  components  of 
variances  in  this  classification  task. 


RESULTS 


Images  of  a  small  maUgnant  microcalcification  cluster  and 
a  benign  cluster  from  our  data  set  obtained  with  a  pixel 


size  of  35  /am  are  shown  in  Figure  3a  and  3b,  respec¬ 
tively.  The  craniocaudal  and  mediolateral  oblique  views 
of  the  same  cluster  are  shown  side  by  side.  Figure  4 
shows  one  view  of  a  malignant  cluster  with  all  four  pixel 
sizes.  Slight  blurring  of  the  image  details  and  the  noise 
can  be  observed  as  the  pixel  size  increases  from  35  to 
140  jam. 

The  ROC  curves  for  the  seven  radiologists  reading  the 
images  with  35-/am  pixel  size  are  shown  in  Figure  5.  The 
ROC  curves  are  spread  over  a  relatively  wide  range.  The 
Aj  values  for  the  radiologists  are  listed  in  Table  2  and 
plotted  in  Figure  6.  The  standard  deviation  of  the 
Aj  ranges  from  0.05  to  0.07,  as  estimated  with  the 
LABMRMC  program.  Only  one  of  the  seven  radiologists 
demonstrated  a  higher  classification  accuracy  with  the 
35-jxm  images  than  with  the  70-  or  105-jLim  images.  The 
Aj-versus-pixel  size  curve  for  this  radiologist  (reader  6) 
had  a  different  trend  from  that  of  other  radiologists.  The 
Aj  of  another  radiologist  (reader  7)  was  basically  constant 
over  the  entire  range  of  pixel  sizes  studied.  The  average 
ROC  curves  for  each  pixel  size  were  derived  from  the 
average  slope  and  intercept  parameters  of  the  seven  indi¬ 
vidual  ROC  curves  and  are  plotted  in  Figure  7.  The  de¬ 
pendence  of  average  Aj  on  pixel  size  is  shown  in  Table  2. 
The  average  A^  showed  a  higher  classification  accuracy 
with  pixel  sizes  of  70  and  105  ju-m.  The  differences  in  A^ 
between  the  different  pixel  sizes  did  not  achieve  statistical 
significance  with  either  the  DBM  method  (16)  or  the  Stu¬ 
dent  paired  t  test.  Table  3  shows  the  P  values  obtained 
with  the  DBM  and  the  paired  t  test  when  images  with  a 
pixel  size  of  35  fim  were  compared  with  those  with  pixel 
sizes  of  70,  105,  and  140  /am.  The  P  values  obtained 
with  the  two  methods  are  very  similar,  which  indicates 
that  the  reader  variation  is  dominant  over  case  variation 
in  this  classification  task. 

Because  of  the  outlying  trend  of  reader  6,  we  per¬ 
formed  the  analysis  of  the  classification  accuracy  without 
this  reader  in  an  attempt  to  evaluate  the  dependenee  of  A^ 
on  pixel  size  for  the  majority  of  radiologists  in  our  study. 
For  these  six  readers,  the  average  A^  for  the  four  pixel 
sizes  was  0.71,  0.74,  0.75,  and  0.71,  respectively.  Al¬ 
though  the  trend  that  the  radiologists  had  a  higher  classi¬ 
fication  accuracy  with  pixel  sizes  of  70  and  105  jam  be¬ 
came  more  apparent,  the  difference  in  the  A^  between  the 
pixel  sizes  still  fell  short  of  statistical  significance.  The  P 
value  determined  with  the  DBM  method  was  .11  for  the 
difference  in  A^  between  35-  and  70- jam  images  and  .12 
for  that  between  35-  and  105-jam  images.  The  corre- 
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C. 

Figure  3.  (a,  c)  Craniocaudal  and  (b,  d)  mediolateral  oblique  in 
noma)  and  (c,  d)  a  benign  cluster  (sclerosing  adenosis)  digitized 


spending  two-tailed  P  values  with  the  Student  paired  t 
test  were  .10  and  ,12,  respectively. 

We  also  analyzed  the  percentages  of  positive  and  neg¬ 
ative  cases  for  which  the  observers  gave  a  confidence 
rating  of  1  in  each  pixel  size.  A  confidence  rating  of  1 


EFFECTS  OF  PIXEL  SIZE 


of  (a,  b)  a  malignant  microcalcification  cluster  (intraductal  carci- 
\  pixel  size  of  35  ^m. 

corresponded  to  a  0%-2%  likelihood  of  malignancy  and 
BI-RADS  categories  of  benign  or  probably  benign  (Table 
1).  These  cases  would  be  returned  to  a  regular  screening 
schedule  or  undergo  short-interval  follow-up  without  bi¬ 
opsy.  The  results  are  shown  in  Table  4.  Each  observer 


appeared  to  have  a  different  threshold  for  suspicion.  For  a 
given  observer,  however,  the  threshold  was  relatively  con¬ 
sistent  among  the  different  pixel  sizes.  There  was  no  ob¬ 
vious  trend  that  this  threshold  depended  on  pixel  size. 
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FALSE-POSITIVE  FRACTION 

Figure  5.  The  ROC  curves  for  seven  radiologists  in  the  evalua¬ 
tion  of  the  images  with  35-/xm  pixel  size.  The  standard  deviation 
of  the  Az  ranges  from  0.05  to  0.07. 

if  any,  with  pixel  sizes  of  35-140  fxm  did  not  achieve 
statistical  significance.  Of  the  Aj.“Versus-pixel  size 
curves  from  seven  radiologists,  only  one  showed  that  a 
pixel  size  of  35  provided  a  larger  A.  than  did  pixel 
sizes  of  70  and  105  p.m.  Although  the  variances  in  the 
A.  were  large,  this  consistent  trend  indicates  a  strong 
likelihood  that  images  with  a  35-p,m  pixel  size  may  not 
provide  a  higher  accuracy  in  the  differentiation  of  ma¬ 
lignant  from  benign  microcalcifications  than  those  with 
a  pixel  size  of  70  or  105  pim.  This  finding  differs  from 
the  expectation  that  a  smaller  pixel  size  would  better 
preserve  the  shape  information  of  microcalcifications 
and,  consequently,  provide  higher  accuracy  in  the  dif¬ 
ferentiation  of  microcalcifications  on  mammograms. 

Our  findings  are  consistent  with  those  of  Karssemeijer 
et  al  (11)  and  Kallergi  et  al  (12)  who,  in  their  ROC 
studies,  compared  the  classification  accuracy  of  micro¬ 
calcifications  on  original  screen-film  mammograms 
with  that  on  images  digitized  at  a  pixel  size  of  100  fim 
and  viewed  on  a  display  monitor. 

Beiden  et  al  (17)  recently  developed  a  bootstrap 
method  for  analyzing  the  variance  components  in  an  ROC 
experiment.  They  analyzed  our  ROC  data  set  and  esti¬ 
mated  the  variance  components  and  the  total  variance  of 
the  difference  in  A^.,  c7^(AA-),  for  any  pairing  of  modalities 
(pixel  sizes),  as  shown  in  Table  5.  We  used  these  vari¬ 


ances  to  estimate  whether  the  finite  sample  size  in  our 
ROC  study  is  the  main  factor  that  caused  the  insignificant 
differences  between  pixel  sizes. 

Equation  (21)  in  the  article  by  Beiden  et  al  (17)  shows 
that  the  total  variance  of  AA.  is  given  as  a^(AA-)  = 
2[<^mc(A^(/A0  +  where  R  is  the 

number  of  readers;  A^o  is  the  sample  size  of  the  current 
experiment;  N  is  the  sample  size  of  a  future  experiment; 
and  are  the  modality-by-case,  modality-by¬ 

reader,  and  effective  error  components  of  the  variance, 
respectively.  The  total  variance  at  an  infinite  sample  size, 
^—>00^  is  thus  caused  only  by  the  reader  variance,  as  fol¬ 
lows:  a^(A^-*>oo)  =  Therefore,  if  we  can  repeat 

the  ROC  experiment  with  an  infinite  sample  size,  the 
minimum  observed  difference  in  A,  between  two  modali¬ 
ties,  [min  AA,(A^-^)],  that  will  allow  rejection  of  the 
^ull  hypothesis,  A2(small  pixel)  =  A,(large  pixel),  with 
P  <  .05  can  be  estimated  as  [min  AA,(yV~-w)]  =  1.645  • 
(t(A^-^oo).  The  values  of  and  [min  AA,(A^-^«>)] 

are  shown  in  Table  5.  The  z  value  of  1.645,  which  corre¬ 
sponds  to  the  one-tailed  P  value  of  .05  for  a  normal  dis¬ 
tribution,  was  used  in  these  estimations  because  it  is  ex¬ 
pected  that  a  smaller  pixel  size  would  provide  better  per¬ 
formance  than  a  larger  pixel  size. 

From  the  standard  deviation,  cr(AA,),  and  the  observed 
difference  in  A^,  we  can  estimate  the  maximum  mean  AAj, 
between  two  modalities.  In  our  ROC  experiment,  we  ob¬ 
served  a  difference  of  AAj.(observed)  =  A-(small  pixel)  — 
A^(large  pixel).  Because  of  the  variance,  we  do  not  know 
the  true  population  mean  AA2(mean)  of  the  normal  distri¬ 
bution  from  which  the  AA„(observed)  was  sampled.  It  can 
be  estimated,  however,  that  we  have  a  less  than  5% 
chance  of  observing  this  AA-  value  if  the  population  mean 
AA2(mean)  of  the  distribution  is  greater  than  [AAj.(ob- 
served)  -  (-1.645)  •  a(AA,)].  This  estimated  bound  of 
mean  AA,  is  denoted  as  [max  AA,(mean)]  and  tabulated  in 
Table  5. 

Because  an  increasing  sample  size  reduces  only  the 
variance  while  the  population  mean  of  the  distribution  of 
AA,  remains  the  same,  the  [max  AA,(mean)]  estimated 
earlier  for  a  finite  sample  size  may  also  be  considered  to 
be  the  maximum  mean  AA,  for  N-^^.  Comparison  of  the 
values  of  [max  AAj.(mean)],  cr(N-^),  and  [min 
AA^(N-^)]  in  Table  5  shows  that  the  [max  AA,(mean)] 
is  approximately  equal  to  a(N-^)  and  is  thus  smaller 
than  [min  AA,(A^-^)]  for  the  35-  versus  70-/xm  and  35- 
versus  105-pim  image  pairs  when  the  sample  size  ap¬ 
proaches  infinity.  Therefore,  the  finite  sample  size  in  our 
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Table  2 

Summary  of  Az  Values 


Pixel  Size  (/mm) 

Reader  1 

Reader  2 

Reader  3 

Reader  4 

Reader  5 

Reader  6 

Reader  7 

Average' 

35 

0.68 

0.62 

0.75 

0.75 

0.65 

0.74 

0.77 

0.71 

70 

0.73 

0.71 

0.77 

0.80 

0.64 

0.65 

0.77 

0.73 

105 

0.80 

0.63 

0.73 

0.81 

0.73 

0.60 

0.77 

0.73 

140 

0.69 

0.64 

0.68 

0.80 

0.68 

0.74 

0.76 

0.71 

Note. — The  standard  deviations  of  the  Az  values  ranged  from  0.05  to  0.07. 

*Az  of  average  ROC  curve,  which  was  obtained  by  averaging  the  slope  and  intercept  parameters  of  the  individual  ROC  curves. 


a.  b. 

Figure  6.  Dependence  of  the  Az  on  pixel  size  for  readers  (a)  1 ,  4,  and  5  and  (b)  2,  3,  6,  and  7. 


Table  3 

Comparison  of  35-fim  Images  with  70, 105, 
and  140-|tm  Images 


Pixel  Size 
(/mm) 

All  Readers 

All  Readers  Except 
Reader  6 

DBM 

Method 

Paired 
t  Test 

DBM 

Method 

Paired 
f  Test 

35  vs  70 

.51 

.51 

.11 

.10 

35  vs  105 

.65 

.65 

.12 

.12 

35  vs  140 

.93 

.91 

.96 

.96 

Note. — Data  are  two-tailed  P  values. 


current  ROC  study  is  not  the  main  contributor  to  the  lack 
of  statistical  significance  in  the  difference  for  the  35-  ver¬ 
sus  70-pLm  and  35-  versus  105-jLLm  image  pairs.  The 
small  difference  in  relative  to  the  large  reader  variance 
may  be  the  main  reason  we  did  not  observe  a  statistically 
significant  advantage  of  the  35-/xm  pixel  size  over  70-  or 
105-p.m  pixel  sizes  in  the  characterization  of  malignant 
and  benign  microcalcifications. 


FALSE-POSITIVE  FRACTION 

Figure  7.  The  average  ROC  curves  for  the  four  pixel  sizes.  Each 
curve  was  derived  from  the  average  slope  and  intercept  parame¬ 
ters  of  the  individual  ROC  curves  from  the  seven  radiologists. 
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Table  4 

Percentage  of  Positive  and  Negative  Cases  that  Received  a  Confidence  Rating  of  1 

35-p.m  Pixel  Size 

70-/i,m  Pixel  Size 

105-/Ltm  Pixel  Size 

140-ptm  Pixel  Size 

Reader 

Negative 

Positive 

Negative 

Positive 

Negative 

Positive 

Negative 

Positive 

1 

9.7 

5.0 

4.2 

2.5 

5.6 

0.0 

4.2 

0.0 

2 

41.7 

30.0 

44.4 

20.0 

37.5 

25.0 

38.9 

22.5 

3 

16.7 

7.5 

13.9 

2.5 

12.5 

5.0 

8.3 

7.5 

4 

25.0 

10.0 

23.6 

7.5 

30.6 

5.0 

26.4 

5.0 

5 

40.3 

20.0 

43.1 

25.0 

50.0 

20.0 

45.8 

25.0 

6 

62.5 

27.5 

52.8 

35.0 

54.2 

40.0 

63.9 

30.0 

7 

23.6 

10.0 

18.1 

5.0 

22.2 

7.5 

22.2 

10.0 

Table  5 

Variance  Components  of  the  ROC  Experiment 

Modalities  (/xm) 

(Tmr 

AA^(obs) 

Max 

A/\^(m)  at 
one-tailed, 
P  =  .05 

(t(N  oc) 

Min 

£J\z(N  ->  <»)  at 
one-tailed, 

P  -  .05 

35  vs  70 

-0.000009* 

0.000867 

0.000833 

0.0216 

-0.02 

0.016 

0.016 

0.026 

35  vs  105 

-0.000014* 

0.001803 

0.000778 

0.0266 

-0.02 

0.024 

0.023 

0.038 

35  vs  140 

0.000024 

0.000488 

0.000928 

0.0213 

-0.00 

0.035 

0.012 

0.020 

70  vs  105 

0.000002 

0.001213 

0.000728 

0.0236 

-0.00 

0.039 

0.019 

0.031 

70  vs  140 

0.000077 

0.001195 

0.000825 

0.0270 

0.02 

0.064 

0.018 

0.030 

105  vs  140 

0.000031 

0.001888 

0.000763 

0.0286 

0.02 

0.067 

0.023 

0.038 

Note. — Data  were  estimated  with  the  bootstrap  method  of  Beiden  et  al  (17).  The  total  variance  a^AAz)  is  computed  from  the  variance 
components  and  Eq  (21)  of  Beiden  et  al  as  a^(AAz)  -  2{(t^  +  where  R  is  the  number  of  readers.  Max  AAz{m)  is  the 

maximum  mean  difference  in  Az  between  two  modalities.  a{N  oo)  =  {2a^r/Ry^  is  the  standard  deviation  and  Min  AAz(N  oo)  is  the 
minimum  difference  in  Az  between  two  modalities  that  will  allow  rejection  of  the  null  hypothesis,  Az  (small  pixel)  =  Az  (large  pixel)  with 
P  <  .05  when  the  sample  size  N  approaches  infinity.  The  variance  component  is  negative  in  some  cases  due  to  the  variance  of  the 
bootstrap  estimation;  the  error  bars  tightly  bracket  the  neighborhood  of  zero. 

‘Data  are  negative  owing  to  the  variance  of  the  bootstrap  estimation;  their  error  bars  tightly  bracket  the  neighborhood  of  zero. 


Another  interesting  observation  can  be  made  from  the 
analysis  of  the  variance  components.  In  this  classification 
task,  the  modality-by-case  variance  component  is 
consistently  near  zero  for  any  of  the  paired  comparisons. 
This  means  that  even  with  an  infinite  number  of  readers, 
the  variations  in  the  two  modalities  will  completely  fol¬ 
low  each  other.  It  is  still  possible  that  the  two  modalities 
will  have  different  mean  performances,  but  cases  that  are 
more  (or  less)  difficult  with  one  modality  will  completely 
follow  in  the  direction  of  cases  that  are  more  (or  less) 
difficult  with  the  other  modality.  This  again  seems  to  im¬ 
ply  that  the  nature  of  the  classification  task  is  more  domi¬ 
nant  than  the  appearance  of  the  image  with  each  modal¬ 
ity. 

One  aspect  of  the  interobserver  variabilities  is  demon¬ 
strated  in  Table  4,  where  the  radiologists’  decision  thresh¬ 
olds  for  biopsy  varied  over  a  wide  range.  The  large  varia¬ 


tion  among  the  ROC  curves  in  Figure  6  indicates  that  the 
variation  among  the  radiologists’  biopsy  recommendation 
is  not  entirely  caused  by  the  use  of  a  different  decision 
threshold  by  each  radiologist  along  similar  ROC  curves. 
This  suggests  that  the  estimation  of  the  likelihood  of  ma¬ 
lignancy  of  microcalcifications  based  on  their  mammo- 
graphic  features  such  as  morphologic  characteristics  and 
spatial  distribution  pattern  is  very  different  among  the 
radiologists.  It  may  be  noted,  however,  that  the  majority 
of  the  cases  used  in  this  ROC  study  had  undergone  bi¬ 
opsy  so  that  easily  distinguished  benign  cases  had  already 
been  excluded  from  the  case  samples. 

To  investigate  the  intraobserver  variabilities  in  the 
classification  of  microcalcifications,  we  repeated  one  read¬ 
ing  session  with  three  observers  (readers  1-3).  The  distri¬ 
butions  of  the  differences  in  the  confidence  ratings  be¬ 
tween  the  two  readings  of  the  same  film  of  a  cluster  by 
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each  radiologist  are  shown  in  Figure  8.  The  differences  in 
the  ratings  range  from  —5  to  +3  for  reader  1,  -5  to  +6 
for  reader  2,  and  —3  to  +4  for  reader  3.  This  is  consis¬ 
tent  with  the  results  of  the  variance  analysis  with  the 
method  of  Beiden  et  al,  where  the  reader  variance  was 
found  to  be  an  important  component  of  the  total  variance 
for  the  classification  of  microcalcifications. 

We  also  attempted  to  analyze  the  correlation  of  the 
estimated  likelihood  of  malignancy  when  the  same  images 
were  read  by  different  radiologists.  The  scatter  plots  of 
the  malignancy  ratings  by  every  two  radiologists  (not 
shown)  were,  in  general,  spread  over  wide  ranges  without 
obvious  correlation.  The  histograms  of  the  difference  in 
the  malignancy  ratings  for  the  same  cluster  between  two 
radiologists  were  similar  to  those  of  the  intraobserver 
variability  shown  in  Figure  8,  with  ranges  as  wide  as  —6 
to  +6.  There  were  some  trends  that  some  radiologists  (eg, 
reader  1)  tended  to  have  higher  likelihood  of  malignancy 
estimates  for  most  clusters  than  did  other  radiologists,  and 
some  radiologists  (eg,  reader  6)  tended  to  have  lower  sus¬ 
picion  for  malignancy  than  did  the  others.  These  trends 
are  consistent  with  the  lower  biopsy  threshold  of  reader  1 
and  the  higher  biopsy  threshold  of  reader  6  (Table  4). 

We  investigated  whether  the  intraobserver  variability 
in  the  malignancy  ratings  depended  on  the  perceived  subt¬ 
lety  of  the  microcalcification  cluster.  Because  reader  3 
demonstrated  the  smallest  range  of  variability  in  the  ma¬ 
lignancy  ratings  among  the  three  radiologists  with  whom 
we  repeated  the  experiment,  we  plotted  the  relationship  of 
the  difference  in  the  malignancy  ratings  between  the  two 
readings  of  the  same  cluster  against  the  subtlety  rating  of 
the  cluster  for  reader  3,  as  shown  in  Figure  9.  There  was 
no  obvious  correlation  between  the  variability  in  the  ma¬ 
lignancy  ratings  and  the  perceived  subtlety  of  the  clusters. 

The  large  inter-  and  intraobserver  variabilities  in  the 
malignancy  ratings  may  be  a  result  of  the  fact  that  the 
radiologists  usually  do  not  have  to  estimate  specifically 
the  likelihood  of  malignancy  of  the  clusters  when  they 
read  mammograms  in  clinical  practice.  However,  because 
their  decision  threshold  for  biopsy  recommendation  also 
varied  over  a  wide  range,  as  discussed  above,  the  vari¬ 
abilities  were  not  simply  caused  by  their  unfamiliarity  in 
the  estimation  of  the  likelihood  of  malignancy.  The  vari¬ 
abilities  may  again  reflect  the  low  specificity  of  the  image 
features  of  the  microcalcifications.  As  can  be  seen  from 
the  examples  in  Figure  3,  the  appearance  of  a  cluster  of 
benign  microcalcifications  from  sclerosing  adenosis  can 
be  very  similar  to  that  of  a  malignant  cluster  from  intra¬ 
ductal  carcinoma. 


•6  -5  -4  -3  -2  -1  0  1  2  3  4  5  6  7 


DIFFERENCE  IN  MALIGNANCY  RATING 

Figure  8.  The  distributions  of  the  differences  In  the  confidence 
ratings  between  the  two  readings  of  the  same  film  of  a  cluster  by 
the  same  radiologist  for  readers  1-3. 


Figure  9.  Scatter  plot  shows  the  relationship  between  the  differ¬ 
ences  in  confidence  ratings  between  the  two  readings  of  the 
same  cluster  and  the  subtlety  ratings  of  the  cluster,  as  rated  by 
reader  3.  The  number  next  to  a  data  point  indicates  the  number 
of  cases  that  overlap  at  the  same  point.  Data  points  without  a 
number  indicate  that  there  is  only  one  case  at  that  point. 


The  dependence  of  classification  accuracy  on  pixel  size 
may  be  further  weakened  when  other  patient  information 
is  available  for  making  diagnostic  decisions.  In  clinical 
practice,  the  decision  for  biopsy  is  not  dependent  on  the 
mammographic  appearance  alone.  When  the  morphologic 
information  is  nonspecific,  other  patient  information  (eg, 
age,  family  history,  and  personal  history)  becomes  impor¬ 
tant  for  estimating  the  likelihood  of  breast  cancer.  Be¬ 
cause  our  goal  was  to  evaluate  whether  the  classification 
accuracy  of  microcalcifications  depended  on  the  pixel  size 
of  the  digitized  images,  we  did  not  provide  such  patient 
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information  to  the  observers.  Our  results  indicate  that  the 
mammographic  information  that  a  radiologist  assesses 
from  the  displayed  images,  such  as  the  morphologic  char¬ 
acteristics  and  spatial  distribution  pattern  of  the  microcal¬ 
cifications,  does  not  have  a  strong  dependence  on  pixel 
size  in  the  range  studied. 

It  may  be  noted  that  in  our  current  ROC  study  we  con¬ 
centrated  on  the  effect  of  pixel  size  on  the  classification 
of  malignant  and  benign  microcalcifications  according  to 
their  mammographic  features.  We  previously  conducted 
an  ROC  study  (9)  to  compare  the  detectability  of  subtle 
microcalcifications  on  original  screen-film  mammograms 
with  that  on  mammograms  digitized  at  a  pixel  size  of  100 
ixm  by  using  an  optical  drum  scanner.  We  found  that  the 
detection  accuracy  for  the  subtle  microcalcifications  de¬ 
creased  when  radiologists  read  the  100-/xm  pixel  size  dig¬ 
itized  images.  Results  of  another  previous  study  (10),  in 
which  we  investigated  the  detection  of  microcalcifications 
by  a  computer  program,  also  indicated  a  reduction  in  de¬ 
tectability  when  the  digitization  pixel  size  increased  from 
35  to  140  ^tm.  The  results  from  these  experiments  indi¬ 
cate  that  spatial  resolution  may  be  more  important  for  the 
detection  than  for  the  classification  of  microcalcifications 
in  mammographic  imaging. 

In  clinical  practice,  an  important  technique  used  by 
radiologists  to  estimate  the  likelihood  of  malignancy  of  a 
microcalcification  cluster  is  to  evaluate  its  interval  change 
between  examinations.  The  number  of  microcalcifications 
in  a  cluster  is  an  important  feature  for  characterizing 
changes.  High-quality  mammograms  that  can  provide  sen¬ 
sitive  detection  of  new,  subtle  microcalcifications  are  cru¬ 
cial  for  such  a  task.  The  results  of  our  previous  studies 
(9,10)  indicate  that  the  spatial  resolution  of  mammo¬ 
graphic  images  will  affect  the  detectability  of  subtle  mi¬ 
crocalcifications.  The  pixel  size  of  digital  mammograms 
may,  therefore,  affect  the  evaluation  of  interval  changes, 
although  the  effect  will  be  reduced  with  the  use  of  magni¬ 
fication  views.  Because  the  radiologists  in  our  current 
study  were  not  provided  with  images  from  previous  ex¬ 
aminations  for  comparison,  the  effects  of  pixel  size  on  the 
detection  of  interval  change  will  warrant  further  investiga¬ 
tion. 

Another  possible  reason  that  the  images  with  a  35-/xm 
pixel  size  did  not  provide  better  classification  accuracy 
for  malignant  and  benign  microcalcifications  than  did  im¬ 
ages  with  70-  or  105-/xm  pixel  sizes,  as  observed  in  this 
study,  is  the  higher  noise  level  in  the  digitized  images  at 
this  small  pixel  size.  A  higher  noise  level  will  reduce  the 
signal-to-noise  ratio  of  the  image  and  may  interfere  with 


the  perception  of  image  features.  It  is  possible  that  if  the 
radiation  dose  to  the  patient  is  unlimited,  a  digital  mam¬ 
mography  system  with  a  smaller  pixel  size  can  provide 
better  classification.  In  the  current  study,  we  investigated 
the  dependence  of  classification  accuracy  on  pixel  size 
under  the  constraint  of  equal  radiation  dose.  The  trade-off 
between  image  quality  and  radiation  dose  and  the  accept¬ 
ability  of  higher-dose  techniques  are  beyond  the  scope  of 
this  study.  Furthermore,  because  digitized  mammograms 
and  mammograms  acquired  with  digital  detectors  have 
different  noise,  contrast  sensitivity,  and  resolution  proper¬ 
ties,  further  investigations  are  needed  to  determine 
whether  a  similar  trend  holds  for  mammograms  acquired 
with  different  types  of  digital  detectors. 

In  conclusion,  we  performed  an  ROC  study  to  investi¬ 
gate  the  effects  of  pixel  size  on  the  classification  of  ma¬ 
lignant  and  benign  microcalcifications  on  digitized  mam¬ 
mograms.  Our  results  indicate  that  the  differences  in  the 
Aj.  between  pairs  of  pixel  sizes  ranging  from  35  to  140 
fxm  do  not  achieve  statistical  significance.  The  pixel  sizes 
in  this  range  therefore  do  not  have  a  strong  effect  on  radi¬ 
ologists’  accuracy  in  the  characterization  of  microcalcifi¬ 
cations,  The  low  specificity  of  the  image  features  of  mi¬ 
crocalcifications  and  the  large  interobserver  and  intraob¬ 
server  variabilities  may  have  prevented  small  advantages 
in  image  resolution  from  being  observed. 
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Classifier  design  is  one  of  the  key  steps  in  the  development  of  computer-aided  diagnosis  (CAD) 
algorithms.  A  classifier  is  designed  with  case  samples  drawn  from  the  patient  population.  Generally, 
the  sample  size  available  for  classifier  design  is  limited,  which  introduces  variance  and  bias  into  the 
performance  of  the  trained  classifier,  relauve  to  that  obtained  with  an  infinite  sample  size.  For  CAD 
applications,  a  commonly  used  performance  index  for  a  classifier  is  the  area,  ,  under  the  receiver 
operating  characteristic  (ROC)  curve.  We  have  conducted  a  computer  simulation  study  to  investi¬ 
gate  the  dependence  of  the  mean  performance,  in  terms  of  on  design  sample  size  for  a  linear 
discnminant  and  two  nonlinear  classifiers,  the  quadratic  discriminant  and  the  backpropagation 
neural  network  (ANN).  The  performances  of  the  classifiers  were  compared  for  four  types  of  class 
distributions  that  have  specific  properties:  multivanate  normal  distributions  with  equal  covariance 
matrices  and  unequal  means,  unequal  covariance  matnees  and  unequal  means,  and  unequal  cova¬ 
riance  matrices  and  equal  means,  and  a  feature  space  where  the  two  classes  were  uniformly  dis¬ 
tributed  in  disjoint  checkerboard  regions.  We  evaluated  the  performances  of  the  classifiers  in 
feature  spaces  of  dimensionality  ranging  from  3  to  15,  and  design  sample  sizes  from  20  to  800  per 
class.  The  dependence  of  the  resubstitution  and  hold-out  performance  on  design  (training)  sample 
si/c  (A,)  was  investigated.  For  multivanate  normal  class  distributions  with  equal  covariance  ma- 
tnccs.  the  linear  discnminant  is  the  optimal  classifier.  It  was  found  that  its  -  versus- 1/N^  curves 
can  be  closely  approximated  by  linear  dependences  over  the  range  of  sample  sizes  studied.  In  the 
feature  spaces  with  unequal  covanance  matnees  where  the  quadratic  discriminant  is  optimal,  the 
linear  discnminant  is  mfenor  to  the  quadrauc  discnminani  or  the  ANN  when  the  design  sample  size 
IS  large  However,  when  the  design  sample  is  small,  a  relatively  simple  classifier,  such  as  the  linear 
discnminant  or  an  ANN  with  very  few  hidden  nodes,  may  be  preferred  because  performance  bias 
mcrca.scs  with  the  complexity  of  the  classifier,  in  the  regime  where  the  classifier  performance  is 
dominated  b\  the  1/A^  term,  the  performance  in  the  limit  of  infinite  sample  size  can  be  estimated  as 
the  intercept  (  \/N,-0)  of  a  linear  regression  of  A,  versus  1/A\.  The  understanding  of  the  perfor¬ 
mance  of  the  classifiers  under  the  constraint  of  a  finite  design  sample  size  is  expected  to  facilitate 
the  selection  of  a  proper  classifier  for  a  given  classilicaiion  task  and  the  design  of  an  efficient 
resampling  scheme.  ©  1999  American  Association  of  Physicists  in  Medicine 
|S(X)94-2405(99)(X)212-6] 

Kc>  words  computer-aided  diagnosis,  classifier  design,  linear  classifier,  quadratic  classifier, 
neural  network,  sample  size,  feature  space  dimensionaJii) ,  ROC  analysis 


I.  INTRODUCTION 

With  the  advent  of  digital  imaging  modaliues,  computer 
aided  diagnosis  (CAD)  is  becoming  an  important  area  of 
research  in  medical  imaging.  A  CAD  algorithm  can  detect 
abnormaliues  and  classify  disease  or  normal  cases  based  on 
image  and/or  paucni  informauon,  and  thus  provide  a  second 
opinion  to  the  radiologist  in  the  detection  or  diagnosuc  deci¬ 
sion  making  process. 

Design  of  classifiers  that  can  accurately  disunguish  nor¬ 
mal  and  abnormal  features  is  a  critical  step  in  the  develop¬ 
ment  of  CAD  algorithms.  It  has  been  shown  that  the  perfor¬ 


mance  of  a  classifier  for  unknown  cases  depends  on  the 
sample  size  used  for  training.^  When  a  finite  design  (train¬ 
ing)  sample  size  is  used,  the  performance  is  pessimistically 
biased  in  comparison  to  that  obtained  from  an  infinitely  large 
design  sample.  In  order  to  design  a  classifier  with  a  perfor¬ 
mance  generalizable  to  the  population  at  large,  one  has  to  use 
a  sufficient  number  of  case  samples  that  are  representative  of 
the  population.  However,  the  availability  of  case  samples  is 
often  limited  in  medical  imaging  research.  It  is  therefore  im¬ 
portant  to  study  the  sample-size  dependence  of  different  clas¬ 
sifiers  and  determine  the  most  efficient  way  of  training  a 
classifier,  under  the  constraint  of  a  finite  sample  size. 
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We  note  that  the  concept  of  generalizabiliry  may  be  used 
in  several  technical  senses  when  assessing  the  performance 
of  a  classifier:  one  with  respect  to  mean  classifier  perfor¬ 
mance,  the  other  with  respect  to  the  variance  of  classifier 
performance.  In  many  classifier  design  problems,  one  is  most 
interested  in  investigating  if  the  mean  performance  of  a  clas¬ 
sifier  esumaied  from  a  given  set  of  finite  design  samples  can 
be  generalized  to  classification  performance  with  unknown 
lest  samples  drawn  from  the  same  population  of  cases.  The 
generalizability  in  this  regard  can  be  observed  from  the  bi¬ 
ases  of  the  mean  performances  in  the  finite  design  set  and  in 
the  lest  set  in  companson  to  the  optimal  performance  esti¬ 
mated  from  an  infinite  design  set.  The  bias  in  the  mean  per¬ 
formance  of  different  classifiers  under  vanous  input  condi¬ 
tions  is  the  subject  of  investigation  in  this  study.  We  will 
discuss  further  other  interpretation  of  generalizability  in  the 
Discussion  section  of  this  paper. 

A  number  of  investigators  have  studied  the  finiie-sample- 
size  problem*  ^  Fukunaga*'^  derived  a  general  formulation 
for  the  bias  and  vanance  of  a  function,  /  which  is  to  be 
estimated  from  the  available  samples.  When  /  is  a  nonlinear 
function  of  the  mean  vectors  and  covanance  matnees  of  two 
feature  distnbutions,  it  has  been  shown  that  a  bias  results 
from  the  nonlinear  propagation  of  the  finite-sample  variances 
in  the  estimates  of  the  mean  vectors  and  covanance  matrices 
of  the  distributions  through  this  function.  For  muliivanatc- 
normal  data,  these  variances  arc  proportional  to  1/A', .  where 
A,  IS  the  design  sample  size,  and  this  dependence  propagates 
into  the  lowcsi-ordcr  terms  in  the  bias.  The  bias  is  indcpen 
deni  of  the  test  sample  size,  A’j^^, .  All  measures  of  classifier 
performance  ihai  count  the  fraction  of  umes  the  decision 
value  lor  an  abnormal  case  exceeds  that  for  a  normal  case 
hndcpcndcni  of  underlying  distribution),  and  vanous  mca 
surcs  o:  error  lor  normally  disinbuicd  decision  functions,  arc 
noniinca’’  lunciions  of  the  parameters  of  the  underlying  dis 
mbuiions  Thc>  arc  thus  subject  to  this  effect  Fukunaga  and 
Hayes  analyzed  the  finite  sample  effects  on  the  prohabilus 
of  miscia.ssificaiion  (PMO  of  a  classifier  and  suggested  a 
technique  that  makes  use  of  the  linear  dependence  of  PMC 
on  1/A,  to  estimate  the  performance  at  A',  — at  with  a  tiniic 
sample  set 

f-or  the  evaluation  of  medical  diagnostic  systems  the 
most  commonly  used  performance  index  is  the  area  unde: 
the  receiver  operating  charactcnsuc  (ROC)  curve.  A  V,c 
have  denved  analytically  that,  for  linear  discnminant  classi 
hers,  the  classihcr  performance  in  terms  of  A.  can  hi  ap 
proximatcd  h>  a  linear  function  in  1/A',,  under  conditions 
when  higher  order  terms  m  A',  can  be  neglected  \\c  have 
been  investigating  the  dependence  of  A.  on  sample  size  hs 
simulauon  studies  ’  Wagner  eial.'°"  have  also  analyzed 
the  effects  of  design  and  test  sample  sizes  on  the  variance 
components  of  the  classifier  performance  Although  these 
behaviors  depend  strongly  on  the  class  distnbuuons  and  the 
properties  of  the  classifier,  the  studies  will  provide  some  in 
sight  into  the  sample  size  requirements  for  the  design  of 
different  classifiers  This  work  may  eventually  lead  to  the 
selection  of  an  efficient  resampling  scheme  for  classihcr  de 
sign,  as  well  as  the  development  of  a  siausucal  test  of  the 
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Fig.  1.  The  sampling  and  evaluauon  scheme  of  the  simulauon  siud> 


sample  size  requirements  and  the  generalizability  of  the 
trained  classifier. 

In  this  paper,  we  will  describe  the  simulauon  studies  and 
analyze  the  effects  of  sample  size  on  classifier  performance. 
Several  commonly  used  classifiers,  including  the  linear  dis¬ 
cnminant,  the  quadratic  discriminant,  and  the  back- 
propagation  neural  network  will  be  studied  and  compared 
under  different  input  conditions.  Feature  distributions  with 
markedly  different  characteristics  will  be  used  to  represent  a 
vancty  of  situations  that  may  be  encountered  in  classification 
problems  for  many  detection  or  diagnostic  tasks. 

II.  MATERIALS  AND  METHODS 

W’c  performed  simulation  studies  to  evaluate  the  effects  of 
sample  size  on  classifier  design.  Normal  and  abnormal  case 
samples  were  randomly  drawn  from  known  probability  dis¬ 
tributions  of  the  two  classes.  These  samples  were  then  used 
to  design  classifiers  for  differentiation  of  normal  and  abnor¬ 
mal  cases.  The  simulation  approach  assures  that  anv  number 
of  ca.se  samples  can  be  obtained  from  populations  with 
known  staustical  properties.  It  thus  allows  evaluation  of  the 
dependence  of  classifier  performance  on  design  sample  size 
and  companson  of  the  performance  with  theorcucally  pre¬ 
dicted  opumal  classification  based  on  the  chosen  probability 
distnbutions. 

A.  Simulation  study 

The  sampling  and  evaluation  scheme  of  the  simulation 
study  IS  shown  in  Fig.  1.  In  this  study,  we  considered  only 
the  situation  in  which  equal  numbers  (  =  of  normal 

and  abnormal  cases  randomly  drawn  from  the  class  distribu¬ 
tions  were  available  in  our  data  set.  A  resampling  strategy 
similar  to  the  technique  suggested  by  Fukunaga  and  Hayes 
WIN  devised  to  generate  the  A,-vs-l//V,  curve.  Subsets  of 

. design  samples  were  randomly  drawn  from 

the  available  sample  set,  again  under  the  constraint  that  the 
numbers  of  normal  and  abnormal  samples  were  equal  in 
subset,  i.e.,  .nonMi=hf,  A/, /2(i=  A  clas¬ 

sifier  was  designed  by  using  each  subset  of  samples.  The 
random  sampling  of  a  given  subset  from  the  available  set  of 
A/,ouj  samples  was  performed  without  replacement,  whereas 
the  random  sampling  of  different  subsets  always  started  from 
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the  same  sei  of  samples.  Therefore,  after  drawing  a 
given  design  subset  ^  the  remaining  samples, 
were  independent  of  the  design  samples  and  used  as  the  test 
samples.  For  simplicity,  the  number  of  design  samples  per 
class  is  denoted  as  in  the  following  discussion . 

In  general,  there  are  two  methods,  resubsutution  and  hold¬ 
out,  for  testing  classifier  performance.  In  the  resubstitution 
method,  the  design  sample  set  is  resubstituted  into  the 
trained  classifier  to  test  its  performance,  whereas  in  the  hold¬ 
out  method,  an  independent  test  set  is  used.  It  has  been 
shown ^  that,  for  a  Bayes  classifier,  if  the  classifier  is  trained 
with  a  finite  number  of  design  samples,  the  resubstitution 
estimate  of  the  classifier  performance  is  optimistically  biased 
whereas  the  hold-oul  estimate  is  pessimisticaly  biased  in 
companson  to  that  achievable  with  an  infinite  design  sample 
set.  The  mean  performance  obtained  from  the  former  estima¬ 
tion  provides  an  upper  bound  and  that  from  the  latter  pro¬ 
vides  a  lower  bound  on  the  true  classifier  performance.  When 
the  design  sample  size  is  limited,  it  is  important  to  evaluate 
the  hold-out  performance  to  avoid  an  overly  optimistic  pre- 
dicuon  of  the  classifier  performance.  In  the  limit  of  very 
large  sample  size,  the  upper  and  lower  bounds  converge  to¬ 
wards  the  unbiased  estimate. 

In  this  study,  we  evaluated  the  performance  of  the  classi¬ 
fier  using  both  the  resubstitution  and  the  hold-out  methods  as 
a  function  of  finite  design  sample  size  A', .  In  order  to  reduce 
the  variances  in  the  estimates  of  /4  , ,  we  randomly  resampled 
without  replacement  each  from  the  same  samples 
A’^  limes,  uaincd  and  icsied  the  classifier,  and  esumaied  the 
average  /I .  from  the  A’^  individual  /\/s  as  shown  in  Fig  1 
Tnc  resuhsiiiuiion  or  hold-out  A. -vs-  \IN\  curve  was  ploucd 
from  the;  points  and  the  unbiased  esumaic  of  A.  in  the  limit 
of  could  be  extrapolated  from  cither  curve. 

This  mcthixl  of  estimating  classifier  performance  at  large 
A ,  h>  generating  a  few  data  points  at  finite  sample  si/cs  is 
similar  to  the  hukunaga  and  Hayes  technique.  However,  uc 
did  not  assume  that  the  j  points  were  in  the  linear  region  of 
the  A. -vs  ]/,\^  curve  and  wc  used  resampling  to  reduce  the 
variances  In  fact,  one  of  the  goals  of  this  study  was  to  m 
vestigaie  the  range  of  design  sample  size  in  which  the  per 
formance  curve  was  approximately  linear  for  various  classi 
fiers  and  probability  distributions  of  the  class  populations 
Tncrclorc.  we  used  a  much  larger  total  number  of  samples 
^  ^  loLa' “  in  our  simulauon  studv  than  was  gcnerallv 

available  for  classifier  design.  We  could  then  choose  A ,  over 

a  wide  range  and  study  the  behavior  of  the  entire  A  -  vs  I /A  , 
curve 

To  estimate  the  population  mean  of  A ,  at  each  A  ,  .  wc 
repeated  the  above  expenment  times,  each  with  200() 
independently  drawn  samples  from  the  population  The 
population  mean  of  A.  was  esumated  by  averaging  the  A. 
values  obtained  from  the  experiments.  Wc  did  not  ana¬ 
lyze  the  variances  in  this  study  because  of  the  complication 
in  the  corrclauon  among  the  values  of  A  ,  introduced  by 
resampling.  A  detailed  analysis  of  the  vananccs  and  its  mod¬ 
eling  was  performed  in  a  separate  study  by  Wagner  et  ai  ‘ 
in  which  a  different  study  design  was  used. 
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By  varying  the  number  of  design  samples  per  class.  A  . 
over  a  large  range  from  20  to  800,  the  regime  where  the  MS. 
dependence  dominated  could  be  observed  from  the  A ,  ipopu 
lation  meanFvs-lW,  (or  1/A^)  curves.  It  is  important  to  note 
that,  although  the  number  of  test  samples.  =  20tK^ 
-A'; ,  varied  from  point  to  point  on  both  the  resubstitution 
and  the  hold-out  curves,  the  bias  in  A.  is  independent  of 
Attest,* ^  The  shape  of  the  A,-vs-lW  curve  is  independent  of 
^icsi,  ^cr  is  fixed.  However,  the  vanance  of  a  given  A, 
does  depend  on  the  test  sample  size. 

For  simplicity,  we  will  refer  to  these  estimates  of  A. 
(population  mean)  as  A,(ir)  for  the  resubstitution  and  as 
A,(ts)  for  the  hold-out  performance  in  the  following  discus¬ 
sions. 


B.  Class  distributions 
7.  Multivariate  normal  distributions 

For  three  of  the  four  types  of  class  distributions,  we  as¬ 
sumed  that  the  normal  and  abnormal  classes  followed  multi¬ 
variate  normal  distributions  in  the  feature  space.  The  dimen¬ 
sionality  of  the  feature  space,  k,  was  varied  from  3  to  15.  The 
characteristics  of  the  multivariate  normal  distributions  can  be 
completely  specified  by  the  multivariate  mean  vector  of  the 
rth  class,  denoted  as  /i^(r=  1,2)  and  its  covariance  matrix, 
denoted  as  The  separation  of  the  normal  and  abnormal 
classes  is  measured  by  the  Bhattacharyya  distance,  B,  de¬ 
fined  as^'^‘ 


1  det[(2,+l2)/2] 

B-  -  (1) 

^  2  v'detSiVdetS^ 

where  detS,  denotes  the  determinant  of  and  A  is  the 
squared  Mahalanobis  distance, defined  as 


A  = 


- 1 


(2) 


The  Mahalanobis  distance  is  the  Euclidean  distance  between 
the  means  of  the  two  distributions,  normalized  by  the  square 
root  of  the  average  of  their  covariance  matrices.  It  can  there- 
lorc  be  considered  to  be  a  measure  of  the  signal-to-noise 
ratio  (SNR)  between  the  abnormal  and  the  normal  distribu¬ 
tions  The  second  term  of  B  is  the  contribution  from  the 
diflercncc  in  the  covariance  matrices  of  the  two  class  distri¬ 
butions.  If  the  covariance  matrices  are  equal,  the  second  term 
will  be  zero  and  the  Bhattacharyya  distance  will  be  equal  to 
1/8  of  the  squared  Mahalanobis  distance. 

In  the  current  study,  three  types  of  multivariate  normal 
class  distnbutions  were  considered.  In  the  following  discus¬ 
sion.  wc  shall  refer  to  the  use  of  simultaneous  diagonaliza- 
uon  for  the  two  covariance  matrices  of  the  class  distribu¬ 
tions  This  operation  leaves  the  normal-based  decision 
functions  unchanged  because  the  distance  measures  that  arise 
m  these  decision  functions  are  invariant  to  any  non-singular 
linear  d-ansformation.’ 

(1)  Equal  covariance  matrices  and  unequal  means:  In 

this  case,  the  covariance  matrices  of  the  normal  and  abnor¬ 
mal  class  distributions  can  be  simultaneously  diagonalized 
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Fig.  2  A  schemaac  illustrauon  of  the  two  class  distributions  uiih  equal  ^  ^  schcmaric  illustrauon  of  the  two  class  distribuuons  with  unequal 

covariance  matrices  and  unequal  means  in  a  2D  feature  space  The  circles  covariance  matrices  and  unequal  means  in  a  2D  feature  space  The  closed 

represent  contours  of  equal  probability  in  each  disinbuiion  curves  represent  contours  of  equal  probability  in  each  distnbuuon 


and  the  vananccs  of  the  individual  feature  components  can 
be  scaled  to  unity.  Therefore,  without  loss  of  generality,  the 
covanance  matrices  of  the  two  classes  could  be  assumed  to 
be  equal  to  identity  matrices,  =  mean  feature 

vector  for  the  first  class  was  assumed  to  be  zero,  Mi  =  0,  and 
the  mean  feature  vector  for  the  second  class,  ^1^=  M  with  all 
components  of  M  equal  to  a  constant  m.  The  magnitude  of  m 
could  be  adjusted  to  obtain  a  desired  separation  of  the  two 
classes.  For  the  purpose  of  this  simulation  study,  we  chose  m 
such  that  the  squared  Mahalanobis  distance  was  3.  i.c.,  the 
Bhaiiacharyva  distance  was  3/8.  for  feature  spaces  of  anv 
dimcnsionalii)  As  discussed  below,  this  separation  corre¬ 
sponds  to  a  theoretical  A  ,  of  0.89.  which  is  in  the  perfor¬ 
mance  range  of  many  classification  problems  in  CAD  appli¬ 
cations  An  example  of  the  two  class  distributions  in  a  2D 
feature  space  is  shown  schematically  in  Fig.  2. 

(21  LnequaJ  covanance  matrices  and  unequal  means: 
Tnc  covanance  matnx  of  the  first  class  was  again  diagonal 
i/cd  and  scaled  to  be  an  identity  matnx,  i ,  =  /  .  and  the  mean 
feature  vector  for  the  first  class  was  assumed  to  be  zero. 
M  ^0  The  covanance  matnx  of  the  second  class.  wa.s 
simuliancousi)  diagonah/.cd  to  have  eigenvalues  .  / 

=  For  this  studs,  we  generated  the  values  of  A  with 

the  simple  relationship 


vectors  of  the  two  classes  were  equal,  /xi  =  ^:  =  0.  In  this 
case,  the  discriminatory  power  of  the  two  classes  comes  en¬ 
tirely  from  the  difference  in  the  covariance  matrices.  A  sche¬ 
matic  of  the  two  class  distributions  in  a  2D  feature  space  is 
shown  in  Fig.  4. 

2.  Checkerboard  distributions 

The  fourth  type  of  class  distributions  was  a  checkerboard 
where  the  normal  and  abnormal  classes  were  located  in  al- 
icmaic  square  box  regions  of  the  feature  space.  Within  each 
box  of  the  checkerboard,  the  feature  vectors  were  uniformly 
distnbuicd.  The  two  classes  did  not  overlap  with  each  other 
so  that  they  could  be  perfectly  separated  by  an  “ideal”  clas¬ 
sifier  with  A,  =  1.  We  considered  a  2x3  checkerboard  in  a 
2D  feature  space  and  a  2X2X2  checkerboard  in  a  3D  feature 
space.  The  example  of  a  2X3  checkerboard  in  a  2D  feature 
space  is  shown  in  Fig.  5.  Such  class  distributions  may  not  be 
common  in  actual  classification  problems  encountered  in 
CAD  However,  it  was  included  in  this  study  to  demonstrate 
the  capability  and  limitations  of  the  different  classifiers  when 
the  class  distributions  were  not  muliivanaic  normal. 

C.  Classifiers 


and  evaluated  one  condition  where  =  1,  and  2  for 

al!  dimensionalities  of  the  feature  spaces.  Wc  also  assumed 
thai  the  components  of  the  mean  feature  vector  '  were 
equal,  the  values  of  which  were  adjusted  to  achieve  a  Bhai 
tacharyya  distance  of  3/8.  For  the  purpose  of  demonstrating 
the  general  trends  of  the  A.-vs-I/A'  curves  and  companng 
the  relative  performance  of  the  different  classifiers  under  the 
various  conditions,  the  specific  choices  of  these  values  arc 
not  cntical.  Figure  3  illustrates  an  example  of  the  two  class 
disinbuiions  in  a  2D  feature  space. 

(3)  LinequaJ  covariance  matrices  and  equal  means: 
The  covanance  matnx  of  the  first  class  was  the  same  as  thai 
in  the  first  two  cases  described  above  The  covanance  matnx 
of  the  second  class  was  proportional  to  the  identity  matnx. 
X:  =  a/.  where  the  proponionaliiy  constant  a  was  adjusted 
to  provide  a  Bhaiiacharyya  distance  of  3/8.  The  mean  feature 
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Wc  studied  three  types  of  classifiers:  the  linear  disenmi- 
nanis.  the  quadratic  discriminants,  and  the  back^propagaiion 
neural  networks.  They  represent  a  range  of  classifiers  com¬ 
mon  I  \  used  in  the  field  of  pattern  recognition  at  present. 


t2 


Fic  4  A  schematic  iUustraiion  of  the  two  class  distributions  with  unequal 
covanance  matnees  and  equal  means  in  a  2D  feature  space.  The  circles 
represent  contours  of  equal  probability  in  each  distribution. 
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1  1.0-^  f 


Fig.  5  An  example  of  a  2X3  checkerboard  m  a  2D  feaiure  space 


(1)  Linear  discriminant  classifier;  The  linear  discrimi¬ 
nant  classifier  can  be  denved  from  the  means  and  the  cova¬ 
riance  matnccs  of  the  class  distributions  as  follows: 

=  V:)-  (4)’ 

where  i  - 1 2)/2.  and  X  is  the  feature  vector  to  be 
classified.  The  means  and  covariance  matrices  have  to  be 
estimated  as  the  sample  means  and  sample  covariance  matri¬ 
ces  from  the  available  design  samples.  The  sample  means 
and  covanance  matrices  undergo  a  nonlinear  transformation 
to  become  the  discnminani  scores,  which  in  turn  arc  trans¬ 
formed  nonlinearly  into  a  measure  of  the  performance  The 
vananccs  m  the  estimated  parameters  propagate  into  the 
mean  classitier  pcrlormancc  and  result  in  a  bias  through  the 
second  dcnvaiivc  of  the  transformauon  function 

li  is  known  ihai.  for  multivariate  normal  disunbuiions  with 
equal  covanance  matnccs.  the  linear  discriminant  classifier  is 
optimal  and  the  classifier  performance  in  the  limit  of  large 
design  samples  is  determined  by  the  Mahalanobis  distance, 
given  b> 


For  the  class  disinbuiions  w'lih  A  =  3  to  be  used  in  this  studs. 
It  can  be  denved  from  Eq.  (5)  that  the  maximum  A  .  that  the 
optimal  linear  discnminani  can  achieve  in  the  limit  of  large 
design  samples  is  0.89. 

(2)  Quadratic  discriminant  classifier:  The  quadrauc  di^ 
cnminant  classifier  can  be  expressed  as' 

1  ]  V 

- -(A  - -In- — (6) 

2  det  1 ; 

When  the  class  distributions  are  muluvanate  normal  with 
unequal  covanance  matnccs.  the  quadratic  discnminani  clas¬ 
sifier  IS  opumal  in  the  limit  of  large  training  samples  The 
Bhattacharyya  distance  gives  an  upper  bound  on  the  Bayes 
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INPUT  HIDDEN  LAYER  OUTPUT 

Fig.  6  A  schematic  diagram  of  a  backpropagauon  neural  network  with  one 
hidden  layer. 

error. ^  The  general  properties  of  the  linear  and  quadratic 
classifiers  have  been  described  in  the  literature  (for  example. 
Fukunaga^). 

(3)  Back-propagation  neural  network:  Many  different 
architectures  and  training  methods  have  been  developed  for 
artificial  neural  networks  (ANN)***  in  various  applications.  In 
this  study,  we  considered  only  a  three-layered  neural  net¬ 
work  trained  with  a  feed-forward  back-propagation  method. 
The  neural  network  has  k  input  nodes,  n  hidden  nodes,  one 
output  node,  and  a  bias  node  in  both  the  input  and  the  hidden 
layers.  The  ANN  architecture  is  denoted  as  k-n-  \  .  The 
nodes  in  the  ANN  are  fully  connected  and  are  trained  with  a 
minimum  sum-of-squares-error  critenon.  The  number  of 
weights  to  be  estimated  is  equal  to  n(/: -F  I )  +  (n -f  ] ).  A 
schcmauc  diagram  of  an  ANN  is  shown  in  Fig.  6. 

III.  RESULTS 

In  our  simulation  study,  we  compared  the  performance  of 
the  linear,  quadratic,  and  backpropagation  neural  network 
classifiers  for  the  different  class  distributions  in  the  feature 
spaces  of  dimensionality  ranging  from  3  to  1 5.  The  number 
of  repeated  expcnmenls  was  chosen  to  be  20  for  all  cases 
in  the  multivariate  normal  feature  spaces  and  1 00  in  the 
checkerboard  feature  space.  The  number  of  data  set  panition- 
ings  in  each  experiment  ranged  from  I  to  20.  These 
choices  arc  a  compromise  between  compulation  time  and 
estimation  accuracy,  especially  for  ANN  classifiers  with  a 
large  number  of  hidden  nodes  in  high  dimensional  feature 
spaces  As  shown  in  the  graphs  discussed  below,  some  of  the 
performance  curves  may  exhibit  fluctuations  that  could  be 
reduced  by  a  larger  number  of  experiments.  However,  the 
general  trend  of  the  performance  curves  should  not  be 
changed  by  the  statistical  uncertainties. 

(1)  Multivariate  normal  distributions — Equal  covari¬ 
ance  matrices  and  unequal  means:  For  class  distributions 
with  equal  covariance  matrices,  the  linear  discriminant  is 
theoretically  the  optimal  classifier  when  the  design  sample 
size  IS  large.  However,  when  the  design  sample  size  is  small, 
the  performances  of  all  classifiers  are  biased.  Figures  7(a)- 
7(c)  show  the  dependence  of  the  obtained  from  resubsti- 
tuiion  (training),  A,(tr),  and  the  A,  obtained  from  the  hold¬ 
out  method  (tesung),  A,(is),  on  l/N  for  the  linear,  ANN,  and 
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Fig.  7.  The  dependence  of  the  4  .  obiained 
from  resubsiiiuuon  (training ^solid  lines). 
A .{ tr) .  and  the  A .  obtained  from  the  hold 
out  method  (testing — dashed  lines). 
A. (is),  on  MN  for  the  class  distributions 
with  equal  covanance  matrices  and  un 
equal  means,  (a)  Linear,  (b)  ANN.  and  (c) 
quadrauc  classifier  Legend:  F3  =  3D  fea¬ 
ture  space,  etc 


0.00  0.01  0.02  0.03  0.04  0.05  0.06 


1/No.  of  Training  Samples  per  Class 
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Fig.  8.  The  performances  of  the  classifiers 
for  class  distributions  with  unequal  cova¬ 
riance  matrices  and  unequal  means,  (a) 
Linear,  (b)  ANN  classifier.  Legend; 
F3=3D  feature  space,  etc.,  solid  lines 
=  A.(tr),  dashed  lines  =  /4,(ts). 


quadratic  classihcr.  rcspcciivcl>.  Two  hidden  nodes  were 
used  lor  the  ANN  -  2  -  1 )  because  il  is  ihc  smallest  num 
ber  of  hidden  nodes  in  a  nonlinear  ANN.  An  ANN  with  onl\ 
one  hidden  node  will  be  a  linear  classifier  and  behave  in  a 
similar  manner  as  the  linear  discnmmani.  On  the  other  hand. 
ANNs  with  a  large  number  of  hidden  nodes  (not  shown)  will 
ovcrfii  the  design  samples  and  have  poor  general izabiliiv  to 
the  unknown  cases,  similar  to  the  ANN  curves  to  be  div 
cussed  below  All  three  classifiers  can  reach  the  optimal  cla'. 
sification  accuracy  of  A,  =  0.S9  m  the  limit  of  large  ^  The 
curves  for  the  linear  classifier  and  the  ANN  (*  -  2  -  1  )  ai 
400  training  epochs  (iterations)  arc  approximately  linear  over 
the  enure  range  The  quadrauc  classifier  does  not  reach  the 
approximately  linear  region  until  N  is  greater  than  about  100 
( 1/N<0.01 )  in  the  highcr-dimcnsional  feature  space.  The  bi¬ 
ases  on  both  the  resubsutuuon  and  hold-out  curves  for  the 
quadratic  classifier  arc  greater  than  those  for  the  linear  clas 
sifier  and  the  ANN  (k  -  2  -  1 ).  The  large  biases  again  indi¬ 
cate  overfitung  and  poor  generalization  by  the  quadrauc  clas 
sifier  in  the  cqual-covanance-matnces  situauon. 
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(2)  .Multivariate  normal  distributions — Unequal  cova¬ 
riance  matrices  and  unequal  means:  The  performances  of 
the  classifiers  for  class  distributions  with  unequal  covariance 
matrices  arc  shown  in  Figs.  8(a)— 8(b).  The  linear  discrimi¬ 
nant  and  the  ANN  (k-2-l)  classifier  (not  shown)  are 
again  approximately  linear  over  the  entire  range  of  N  stud¬ 
ied  However,  the  A,  at  l/yv=0  decreases  as  the  dimension¬ 
ality  of  the  feature  space  increases.  This  is  because  both  the 
linear  discnmmant  and  the  near-linear  ANN  (k-2-\)  can¬ 
not  make  use  of  the  class  separability  due  to  the  differences 
in  the  covariance  matrices  which  is  the  second  term  in  the 
Bhattacharyya  distance.  The  second  term  increases  relative 
to  the  first  term,  the  squared  Mahalanobis  distance,  when  the 
Bhattacharyya  distance  is  fixed  and  the  dimensionality  of  the 
feature  space  increases. 

The  performance  curves  of  the  ANN  at  large  N  improve 
when  a  greater  number  of  hidden  nodes  and  a  sufficient  num¬ 
ber  of  training  epochs  are  used.  The  number  of  hidden  nodes 
required  to  reach  the  optimal  classification  of  Aj  =  0.89  at 
l/A/  =  0  increases  with  the  dimensionality  of  the  feature 
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Fig  9  The  dependence  of  ihc  pcrfor 
mance  curves  on  the  number  of  training 
epochs  for  an  AW  with  nine  hidden 
nodes  in  a  9D  feature  space  ANN(9-9 
- 1).  Legend;  ii500=  500  training  epochs, 
etc.,  sobd  lincs  =  A,(tr).  dashed  lines 
=  A.(is).  The  expanded  view  in  (b)  shows 
the  trend  of  the  curves  at  large  sample 
sizes. 


space  hipurc  Hibt  shows  the  performance  of  the  A.N'Ns  when 
Ihc  number  of  hidden  nodes  is  equal  to  ihc  dimcnsionaliu  in 
each  feature  space  Since  ihc  number  of  wciphLs  lo  be  trained 
increase^  rapidl>  wiih  increasing  number  of  ntxlcs  in  an 
ANN,  Ihc  number  of  epochs  required  for  training  the  ANN  i« 
achieve  a  rca.sonahlc  classification  accuracy  incrca.scs  a*, 
cordingly  The  rcsubsutuiion  and  hold  out  pcrfonnancc 
curves  of  each  ANN  shown  in  Fig.  81b)  were  chosen  at  the 
smallest  number  of  training  epoch  that  resulted  in  approx i 
matciy  the  highest  A  .  value  when  the  hold-out  curve  was 
extrapolated  to  l/A’  =  0  The  number  of  training  cptKhs  re 
quircd  to  reach  the  highest  A,,  increased  as  the  dimensional 
iiy  and  the  number  of  hidden  nodes  in  the  ANN  incrca.scd  It 
ranged  Irom  about  4000  to  10000  for  the  conditions  shown 
m  Fig  8(b)  Wc  did  not  attempt  to  perform  an  exhaustive 
search  for  the  -opumal"  number  of  hidden  nodes  in  each 
feature  space  because  of  the  extensive  compuiauon  umc  re 
quircd  for  the  search.  Instead,  we  evaluated  ANNs  with  a 
few  different  numbers  of  hidden  nodes  in  each  feature  space 
and  chose  the  "besf’  ANN  within  those  studied  With  this 
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approximation  wc  observed  that,  in  a  /t-dimcnsional  feature 
space  and  with  these  class  distributions,  an  ANN  with  ap¬ 
proximately  k  hidden  nodes  can  approach  the  optimal  perfor¬ 
mance  when  the  design  sample  size  and  the  number  of  own¬ 
ing  epochs  arc  sufficiently  large,  as  shown  in  Fig.  8(b). 

1  o  illustrate  the  training  of  an  ANN  with  a  large  number 
of  hidden  nodes,  wc  show  the  dependence  of  the  resubsiitu- 
iion  and  the  hoid-oul  curves  on  the  number  of  training  ep¬ 
ochs  for  ANN  (9-9-1)  in  Fig.  9.  A  number  of  commonly 
discussed  problems  of  an  ANN  can  be  observed.  In  the  small 
A  region  below  about  60  samples  per  class,  over- 
paramctnzation  and  over-training  are  obvious,  i.e.,  near  per¬ 
fect  classification  during  training  [Aj(tr)  greater  than  0.95] 
and  poor  generalization  [A,(ts)  below  about  0.8],  The  prob¬ 
lem  becomes  more  pronounced  with  an  increasing  number  of 
training  epochs.  In  the  middle  range  of  2(X)  to  400  samples 
per  class  where  Aj(is)  increases  to  a  maximum  then  de¬ 
creases  with  further  training,  an  “optimal”  number  of  train¬ 
ing  epoch  exists.  Only  in  the  region  with  a  sufficiently  large 
N  (greater  than  about  500  per  class).  A,(ts)  increases  with 
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Fig.  10.  The  dependence  of  the  pertor 
mance  curves  of  an  ANN  on  the  number 
of  hidden  nodes  in  the  9D  feaiurc  space 
for  class  distributions  with  unequal  cova 
riance  matnees  and  unequal  means.  Leg¬ 
end:  F921=AN"N  with  two  hidden  nodes, 
etc.,  solid  lines=A.(tr).  dashed  lines 


increasing  number  of  U'aining  epochs  within  the  range  stud¬ 
ied.  The  A,(ts)-vs-lW  curve  becomes  linear  for  A'  greater 
than  about  200.  This  dependence  of  ANN  on  training  epoch 
IS  generally  observed  for  ANNs  with  a  large  number  of  hid 
den  nodes  and  in  high-dimensional  feature  spaces,  although 
the  design  sample  size  required  in  order  to  avoid  over 
training  and  ovcr-parametnzaiion  vanes.  It  reinforces  our 
general  cxpcncncc  that  the  ANNs  with  a  large  number  of 
weights  can  ovcrfii  the  design  samples  easily  and  provide 
poor  generalization  when  the  sample  size  is  small 

The  pcnormancc  curves  of  ANNs  with  different  numtxrrv 
of  hidden  nodes  in  the  9D  feature  space  arc  shown  in  Fig  10 
Trie  curves  tor  a  given  ANN  were  again  chosen  at  a  training 
epoch  in  which  the  hold-out  curve  approached  approximaiciv 
the  highest  pcrlormancc  at  1/A  =  0.  The  chosen  training  cp 
och  ranged  from  600  to  12  000  for  the  2-  to  15-hiddcn  n(xic 
ANNs  shown  When  the  number  of  hidden  nodes  is  small 
the  highest  A  obtained  by  extrapolation  to  l/.A  =  0  appearv 
to  be  below  the  ihcorciical  opumum  of  0.89  For  example 


the  A.  extrapolated  to  l/N  —  0  is  about  0.85  for  ANN  (9-2 
-  IL  and  is  about  0.87  for  ANN  (9-6-1).  The  ANN  with 
nine  hidden  nodes  appears  to  approach  the  optimal  A,  of 
0.89  in  the  limit  of  1W  =  0.  However,  the  ANN  (9-9-1) 
docs  not  reach  the  approximately  linear  region  until  N  is 
greater  than  about  200  (easier  to  see  in  Fig,  9).  As  can  be 
seen  from  the  hold-out  curves,  increasing  the  number  of  hid¬ 
den  nodes  further  will  increase  overfitting,  reduce  generaliz- 
abiliis,  and  increase  train  time  without  gaining  true  improve¬ 
ment  in  performance  for  classification  of  unknown  case 
samples 

The  quadratic  classifier  is  the  theoretically  optimal  classi¬ 
fier  for  the  class  distributions  with  unequal  covariance  ma- 
inecs  h  can  optimally  utilize  the  class  separability  contrib¬ 
uted  b\  both  the  differences  in  the  means  and  the  covanance 
matrices  The  performance  curves  for  the  quadratic  classifier 
inoi  shown)  in  feature  spaces  of  different  dimensionalities 
arc  \cr\  similar  to  those  obtained  for  the  equal  covanance 
matrices  situation  [Fig.  7(c)].  The  A,  of  the  quadratic  classi- 


Fig.  11.  Comparison  of  the  performance 
curves  of  the  linear,  quadratic.  ANN(9-2 
-1).  and  ANN(9-9-l)  classifiers  in  the 
9D  feature  space  for  class  distributions 
with  unequal  covariance  matrices  and 
unequal  means  Legends:  L= linear; 
Q= quadratic,  ANN  =  neural  network, 
solid  lines  =  Aj(tr),  dashed  lines  =  A, (is). 
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Fig.  12.  The  dependence  of  the  perfor¬ 
mance  curves  on  dimensionaluy  of  feaiurc 
space  for  the  class  distributions  with  un¬ 
equal  covariance  matrices  and  equal 
means,  (a)  Linear,  (b)  ANN  classifier 
Legend:  F3  =  3D  feature  space,  etc  P921 
=ANN  with  two  hidden  nodes,  etc.  solid 
lines=A.(tr),  dashed  bnes=A,(is) 


ftcr  reaches  the  opiimaJ  value  of  0.89  in  the  limit  of  large  A 
for  all  dimcnsionaliucs  studied. 

Figure  1 1  shows  a  comparison  of  the  performance  of  the 
linear,  quadraiic.  and  ihc  ANN  classifiers  with  two  and  nine 
hidden  nodes  The  bia.scs  on  ihc  rcsubsiiluuon  and  ihc  hold 
out  curves  of  Ihc  quadratic  classifier  arc  not  as  large  as  those 
of  the  ANN  (y-9-  | )  classifier.  However,  in  the  rcginic  of 
small  design  sample  s17.es.  the  hold-out  curve  of  the  optima! 
quadratic  classifier  can  be  much  lower  than  the  correspond 
mg  curves  of  the  linear  classifier  or  ANN  with  one  or  iwo 
hidden  nodes  This  result  indicates  that  the  thcoreucaJly  op 
umal  classifier  may  not  be  the  optimal  choice  when  the 
available  design  sample  size  is  small  and  over 
paramctnzauon  becomes  an  important  considcrauon 

(3)  Multivariate  hormaJ  distributioi»— Unequal  cova- 
i^ce  matrices  and  equal  means:  Figure  12(a)  shows  the 
dependence  of  A,  on  1//V  for  the  hncar  classifiers  for  the 
class  distnbuuons  with  equal  means.  Since  the  MahaJanobis 
distance  is  zero  when  the  means  of  the  two  class  disinbu 
uons  arc  equal,  the  linear  classifier  performs  no  better  than 
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random  guessing  in  the  hold-oul  situation  {A.(ts)  =  0.5). 
However.  11  is  somewhat  surprising  that  the  resubstitution 
curve  can  be  biased  to  very  high  A,  values,  when  the  design 
sample  IS  small.  The  bias  increases  with  increasing  dimen¬ 
sionality  of  the  feature  space  because  the  severity  of  ovcrfii- 
ting  in  the  design  samples  worsens  with  increased  paramctcr- 
i/ation  in  the  linear  discriminant  function.  This  indicates  that 
the  predicted  performance  of  a  classifier  can  be  unrealisti¬ 
cally  optimisuc  if  the  test  samples  are  not  independent  of  the 
design  samples. 

For  Ihc  class  distributions  with  equal  means,  it  is  much 
more  difficult  to  train  the  ANN  classifier.  The  number  of 
hidden  nodes  and  the  number  of  training  epochs  required  for 
the  ANN  to  approximate  the  decision  surfaces,  which  arc 
spherical  hypersurfaces  in  the  ft-dimensional  feature  space 
increase  as  k  increases.  Figure  12(b)  shows  the  A^-vs-lW 
curves  for  the  ANNs  in  which  the  number  of  hidden  nodes  is 
2  umes  the  dimensionality  of  the  feature  space.  The  number 
of  training  epochs  required  to  approach  the  highest  perfor- 
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Fig.  13.  (a)  The  dependence  of  the  perfor¬ 
mance  curves  of  an  AXN  on  the  number 
of  hidden  nodes  in  the  9D  feaiure  space 
for  class  distributions  with  unequal  cova 
riance  matrices  and  equal  means.  In  the 
expanded  scale  (b).  the  approximately  lin¬ 
ear  regions  of  the  curves  can  be  observed 
Sobd  lmes= A4tr),  dashed  lines  =  A  .(ts) 


mance  tor  a  given  ANN  archucciurc  ranges  from  ahoui  18(K) 
lo  20 (XK)  in  these  cases.  Again  we  did  noi  aiiempi  an  ex 
hausiivc  search  tor  the  “opiimar’  number  of  hidden  ntxlcs 
in  each  case  These  ANNs  were  chosen  because  they  appear 
to  approach  the  maximum  performance  of  A  .  =  0  8^  in  the 
limii  of  large  A  and  their  number  of  hidden  nodes  is  a  simple 
muliipic  ol  the  dimensionality .  Compared  to  the  class  distn 
buiions  with  unequal  means,  for  a  given  dimcnsionaliis .  the 
number  of  hidden  nodes  and  the  number  of  training  epochs 
required  for  achieving  the  near  maximum  performance  ai 
large  A  arc  greater  in  this  equal-mean  situation.  Figure  !3iai 
shows  an  example  of  the  dependence  of  the  performance 
curves  on  the  number  of  hidden  nodes  in  the  9D  feature 
space.  Figure  13(bl  is  an  enlarged  view  of  the  curves  in  Fig 
13(a)  in  the  range  where  the  sample  size  is  greater  than  2(X) 
per  class.  The  hold-oul  performance  of  ANN{9-9-l)  at 
1/A  =  0  reaches  about  0.85.  When  the  number  of  hidden 
nodes  IS  greater  than  nine,  the  performances  of  the  ANNs  at 
1/A  =  0  arc  similar  and  approach  the  optimal  A. 

The  quadratic  discriminant  is  again  the  ihcorcticallv  opu- 
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mal  classifier  for  the  class  distributions  with  unequal  covari¬ 
ance  mainces.  Its  performance  curves  (not  shown)  arc  very 
similar  to  those  plotted  in  Fig,  7(c),  except  that  the  extrapo¬ 
lated  A  ,  values  at  1/A  =  0  do  not  reach  as  high  as  those  in  the 
equal  covanance  matnees  situation.  By  using  the  approxi¬ 
mately  linear  region  of  the  A, -vs-  1/A  curve  at  A  greater  than 
1(K).  the  extrapolated  A.  ranges  from  about  0.873  to  0.885 
for  the  3D  to  15D  feature  spaces,  in  this  case,  it  is  much 
more  efficient  to  train  a  quadratic  discnminani  than  the 
ANN.  Since  the  linear  discriminant  and  ANNs  with  few  hid¬ 
den  nodes  cannot  provide  effective  classification  regardless 
of  the  design  sample  size,  the  quadratic  discriminant  is  ob¬ 
viously  the  optimal  classifier  both  in  terms  of  performance 
and  naming  efficiency. 

(4)  Checkerboard  distributions:  In  a  feature  space  with 
checkerboard  class  distributions,  classification  is  difficult  for 
many  classifiers  because  of  the  disjoint  clusters  of  samples 
belonging  to  the  same  class.  We  compared  the  three  classi¬ 
fiers  in  such  a  situation  by  two  examples.  Figure  14  shows 
the  performance  curves  of  the  three  classifiers  in  a  2D  feature 
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Fig  14.  Performance  curves  of  ihc  ihnx 
classifiers  for  a  2x3  unit  chcckcrboarc 
in  a  2D  feature  space  L  =  linear, 
0=Quadraiic.  ANN25 1  =  backpropagaiion 
neural  network  with  five  hidden  nodes 
Solid  lines  =  A.ttr).  dashed  lmes=A,(ts^ 


space  with  a  2x.^  unit  checkerboard  distribution.  Both  the 
linear  and  the  quadratic  discriminants  perform  poorlv  even 
for  the  rcsubstitution  method  where  A.  values  are  in  the 
range  of  0.6  to  0.7.  However,  the  ANN(2-3- 1 1  can  achieve 
an  A.  of  0.96  (not  shown)  and  the  ANN(2-5-l)  a  near- 
perfect  classification  at  a  training  epoch  of  about  1200 
In  a  3D  feature  space  with  a  2X2X2  unit  checkerboard 
disinbuiion.  the  difficulty  in  classification  cxpencnccd  b\  the 
linear  and  quadratic  discriminants  is  even  more  apparent 
Figure  lf>  shous  that  the  hold-out  curve  of  the  linear  classi 
her  IS  basical))  the  same  as  random  guessing.  The  hold  out 
curse  o!  tne  quaaratic  classifier  is  slightly  higher  than  0  5  a: 
small  design  sample  si/es  but  approaches  0.5  as  the  design 
sample  increases  On  the  other  hand,  the  AN.N’(3-3  -  1 1  can 
attain  a  tcsi  A  of  0  d  t  not  shown )  and  the  ANN(3  -  5  -  1 1  can 
reach  ncar-pcrfcci  classification  at  large  design  sample  si/es 
after  about  15(KI  training  epochs  These  two  examples  dem 
onstraie  that  an  ANN  classifier  can  be  superior  to  the  linca' 


or  quadratic  classifiers  for  class  distributions  that  are  very 
different  from  the  idealized  multivariate  normal  distribu¬ 
tions 

IV.  DISCUSSION 

Classifier  design  is  an  important  field  of  research  in 
computer-aided  diagnosis.  Yet  many  of  the  issues  related  to 
classifier  design  have  not  been  explored  systcmaucally.  This 
simulation  study  is  a  part  of  our  on-going  investigation  of  the 
sample  st/,c  effects  on  classifier  design.’"*'  '-'  In  this  study, 
wc  evaluated  classifier  performance  for  three  multivariate 
normal  class  distributions  with  specific  properties:  equal  co- 
vanance  mamces.  unequal  covariance  matrices,  and  equal 
means  These  distributions  arc  idealized  but  they  do  approxi¬ 
mate  a  range  of  situations  that  may  occur  in  real  classifica¬ 
tion  problems  Since  the  optimal  classifier  and  the  upper 
bound  of  classification  accuracy  in  the  limit  of  1W  =  0  arc 


Fig.  15.  Performance  curves  of  the  three 
classifiers  for  a  2X2X2  unit  checkerboard 
distribution  m  a  3D  feature  space.  Legend: 
L=linear.  Q=quadratic.  ANN351=back- 
propagation  neural  network  with  five  hid- 
d£n  nodes. 
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known  for  each  of  these  cases,  we  can  compare  the  perfor¬ 
mances  of  the  classifiers  under  each  condition  with  the  opti¬ 
mum.  In  addition,  a  checkerboard  class  distribution  was  in¬ 
cluded  in  the  study.  A  comparison  of  the  performances  of  the 
different  classifiers  for  this  class  distribution  can  illustrate 
their  effectiveness  when  the  distributions  are  very  different 
from  multivariate  normal. 

For  all  three  classifiers,  the  A^Ctr)  obtained  by  resubstitu¬ 
tion  is  biased  optimistically  while  the  A. (is)  obtained  by 
testing  with  an  independent  test  set  is  biased  pessimistically, 
relative  to  the  A.  in  the  limit  of  except  for  the  situ¬ 

ations  when  A,(tr)  is  bounded  from  above  by  perfect  classi¬ 
fication  iA.=  1  )  or  when  A,(ts)  is  bounded  from  below  by 
random  guessing  (A,  =  0.5).  The  magnitude  of  the  biases 
increases  as  the  design  sample  size  decreases  and  as  the  di¬ 
mensionality  of  the  feature  space  increases.  In  the  cases 
where  a  given  classifier  has  no  discriminatory  power  for  a 
given  class  distribution,  for  example,  the  linear  discriminant 
for  the  equal-mean  or  checker-board  class  distributions,  or 
the  quadratic  discriminant  for  the  3D  checker-board  class 
distribution,  the  test  A,(ts)  remains  almost  constant  at  0.5, 
independent  of  the  design  sample  size.  In  many  cases,  the 
A--VS-1/A'  cur\'c  cannot  be  approximated  by  a  straight  line 
that  extrapolates  to  the  A  ,  at  l/N-0  until  the  design  sample 
sizes  arc  vcr\  large,  beyond  the  range  of  sample  sizes  that 
are  generally  available  for  CAD  classifier  design  To  esti¬ 
mate  the  performance  of  a  classifier  at  large  A’  under  the 
constraint  of  a  small  design  sample,  one  may  use  the  Fuku- 
naga  and  Hayes  resampling  scheme^  to  denve  several  points 
along  the  A. -vs-  1/A’  curves  in  the  small  sample  size  region 
If  the  cxu'apolaicd  rcsubstitution  and  hold-out  curves  do  not 
converge  to  approximately  the  same  A.  at  1/A'  =  0.  an  aver¬ 
age  oJ  the  points  on  the  two  curves  which  correspond  to  the 
same  design  sample  si/e  may  be  a  closer  estimate  of  A .  than 
cither  A  i  u  i  or  A  .1  is )  It  may  be  noted  that  the  rcsubsiiiuiion 
and  the  hold -out  curves  arc  not  biased  symmeuncallv  from 
the  A .  at  inliniic  A,  the  average  thus  obtained  will  onl>  be  a 
rough  estimate  It  is  also  not  valid  in  cases  when  the  classi 
tier  has  no  discnmmaiory  power  with  A. (is)  constant  at 
about  0  5  or  when  the  resubsiituuon  curve  is  ovcrlv  opiimis 
tic  with  A. (IT)  constant  at  about  1 

in  an>  case,  caution  should  be  taken  in  estimating  classi 
her  performance  h>  extrapolation  to  1/A^  =  0  or  h\  averaging 
the  rcsubsiiiuiion  and  hold-out  performance  as  discussed 
above  The  csiimaied  performance  contains  variances  that 
have  to  be  estimated  using  further  tools  One  such  aitcmpt  in 
estimating  the  components  of  vanance  by  a  bootstrapping 
resampling  scheme  has  been  studied  rccentlv  bv  Wagner 
et  a!  ''  These  esiimaies  reveal  the  amount  of  bias  and  van 
ance  in  the  classifier  performance  obtained  with  the  hniie 
design  samples,  thus  allowing  estimauon  of  the  sample  si/.c 
required  to  achieve  a  desired  degree  of  general i /.abi lu> . 
rather  than  replacing  the  need  for  a  larger  sample  set  and 
further  studies. 

With  the  equal-covanancc-matrix  class  distributions,  the 
linear  discnminani  is  the  optimal  classifier  as  expected.  TFie 
biases  arc  low  and  the  computation  is  efficient.  Moreover, 
since  the  A.-vs-l/A'  relationship  is  linear  over  almost  the 


entire  range  of  design  sample  sizes,  the  classifier  pcrl'or- 
mance  at  very  large  N  can  be  estimated  from  the  small 
sample  size  performance  by  linear  interpolation,  as  sug¬ 
gested  by  Fukunaga  and  Hayes^  and  demonstrated  previousls 
by  Wagner  et  al^ 

With  the  unequal-covariance-matrices  and  equal-mean 
class  distributions,  the  linear  discnminani  and  the  back- 
propagation  neural  network  with  one  hidden  layer  arc  infe¬ 
rior  to  the  quadratic  classifier  when  the  design  sample  size  is 
large.  The  linear  discriminant  cannot  utilize  the  difference  in 
the  covariance  matrices  and  underestimates  the  class  separa¬ 
bility  even  when  an  infinite  number  of  design  samples  is 
available.  The  ANN  needs  a  relatively  large  number  of  hid¬ 
den  nodes  and  a  large  number  of  training  epochs  in  order  to 
reach  the  optimal  performance.  Its  hold-out  performance  and 
the  computation  efficiency  are  both  inferior  to  those  of  the 
quadratic  classifier.  However,  for  the  unequal-covanance- 
matrices  and  unequal-mean  case  and  a  small  design  sample 
size,  the  linear  classifier  or  an  ANN  with  very'  few  hidden 
nodes,  e.g,,  n  =  2,  provides  better  hold-out  performance  than 
the  more  complex  ANNs  or  the  optimal  quadratic  classifiers. 
These  results  indicate  that  the  bias  on  classifier  performance 
increases  with  increasing  complexity  (loosely  related  to  the 
number  of  parameters  to  be  estimated)  of  the  classifier.  The 
linear  classifier  contains  (/:+!)  independent  parameters  and 
the  quadratic  classifier  contains  (/:+  1  )(A:  +  2)/2  independent 
parameters  in  their  formulations.  The  number  of  weights  to 
be  estimated  for  the  ANN  depends  on  the  number  of  hidden 
nodes  as  n(/:+l)  +  (n+l).  The  number  of  weights  in  an 
ANN  can  therefore  easily  exceed  that  of  a  quadratic  classi¬ 
fier.  although  the  estimation  of  the  mean  and  covariance  ma¬ 
trices  for  the  linear  and  quadratic  discriminants  may  contrib¬ 
ute  additional  “complexity*’  to  the  classifier  design.  Two 
observations  can  be  made.  First,  when  the  available  sample 
si/c  is  small,  a  simple  classifier  will  have  better  generaliza¬ 
tion  than  a  more  complex  classifier.  Second,  a  complex  ANN 
or  a  quadratic  classifier  trained  with  an  insufficient  number 
of  design  samples  generalizes  poorly,  even  if  it  is  the  optimal 
classifier  for  the  class  distributions.  It  is  therefore  important 
to  select  an  appropriate  classifier  by  taking  into  consideration 
the  design  sample  size. 

A  further  problem  in  classifier  design  is  that  the  true 
population  distributions  of  the  classes  in  the  feature  space  are 
generally  unknown.  It  was  suggested  that  the  quantile- 
quantile  (Q-Q)  plot  and  the  chi-square  plot  may  be  used  for 
investigating  the  normality  of  univariate  and  multivariate 
sample  distributions,  respectively.*^  However,  it  is  still  un¬ 
known  under  what  criteria  the  chi-square  plot  will  indicate 
that  It  IS  optimal  to  use  a  classifier  designed  under  the  nor¬ 
mality  assumption.  For  any  measure  of  goodness-of-fit,  when 
the  sample  size  is  small,  only  the  most  aberrant  deviations 
from  the  normal  distribution  can  be  identified  as  a  lack  of  fit 
from  these  plots.  Therefore,  there  is  often  no  a  priori 
knowledge  to  select  an  “optimal”  classifier  or  to  predict 
whether  the  observed  performance  is  caused  by  the  sample 
size,  the  choice  of  an  overly  complex  classifier,  or  by  an 
actual  poor  separation  of  the  classes  in  the  feature  space.  If 
one  observes  poor  generalization  of  a  trained  classifier  in  a 
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truly  independent  test  set,  it  will  be  important  to  take  into 
consideration  all  these  factors  and  redesign  the  classifier. 

In  this  study,  we  assumed  that  the  best  features  have  al¬ 
ready  been  determined  for  the  classification  task.  In  a  general 
classifier  design  problem,  the  best  set  of  features  usually  has 
to  be  selected  based  on  the  available  design  samples.  The 
feature  selection  step  will  introduce  additional  biases  to  the 
classifier  performance.  The  number  of  features  selected  also 
has  a  strong  influence  on  the  classifier  design,  as  can  be  seen 
from  the  dependence  of  the  bias  on  the  dimensionality  of  the 
feature  space.  The  investigation  of  this  more  complex  situa¬ 
tion  including  both  the  feature  selection  and  classifier  train¬ 
ing  steps  IS  underway.’’ 

The  term  gencralizability  is  nonspecific  and  needs  to  be 
qualified  here.  The  present  paper  is  concerned  with  the  gen- 
eralizability  of  the  mean  performance  of  classifiers  to  un¬ 
known  test  samples  drawn  from  the  same  populauon  of 
cases.  We  have  shown  in  this  paper  that  the  mean  perfor¬ 
mance  of  a  classifier  depends  on  the  number  of  samples  used 
to  tram  the  classifier,  the  architecture  of  the  classifier,  and— 
for  muliivanatc-normal  data — the  means  and  covanances  of 
the  population  distributions.  Suppose  in  this  context  that  a 
classifier  is  trained  on  a  given  finite  number  of  design 
samples  (patients!.  The  mean  performance  of  the  classifier 
over  independent  replications  with  the  same  number  of  dc 
sign  samples  is  gcncralizablc  to  studies  characterized  by  the 
same  number  of  design  samples.  In  other  words,  the  mean 
resubsiiiution  or  hold-out  performance  is  an  unbiased  csii 
mate  for  repeated  sampling  of  independent  design  and  ics: 
sample  sets,  respectively,  when  the  same  number  of  design 
sample*,  is  used  The  classifier  performance  may  not.  him 
ever,  be  gcncralizablc  to  studies  charactcnzcd  by  a  dificrcni 
number  o(  design  samples  In  panicular.  when  a  very  large 
and  representative  design  sample  size  is  used,  the  mean  per 
lormancc  may  be  very  different  from  the  mean  performance 
lhai  characicn/cs  the  hnitc-training-samplc  condition  When 
the  mean  performance  under  the  conditions  of  a  finite  design 
sample  size  is  close  lo  that  expected  with  a  very  large  design 
sample  size,  the  finite-training  sample  performance  is  said  u* 
be  gcncralizablc  to  the  population  performance 

The  term  gcncralizabiliiy  is  not  only  used  with  rcspcci  to 
mean  performance,  it  is  also  used  with  respect  to  unccnainiv 
in  performance.  a.s  reflected  in  estimates  of  error  bars  tsian 
dard  deviations,  or  the  corresponding  variances  I  bor  cx 
ample,  if  we  think  of  repeating  a  given  training  and  testing 
cxpcnmcni  on  a  classifier  and  if  only  the  test  samples  art- 
drawn  independently  on  the  repeated  tnals.  then  the  esn 
mated  uncertainties  arc  said  to  be  gcncralizablc  only  to  a 
population  of  test  samples.  If.  however,  we  think  of  repeal 
ing  the  cxpcnmcni  and  independently  drawing  new  training 
samples  as  well  as  new  test  samples,  then  the  estimated 
uncertainties  arc  said  to  be  gcncralizablc  to  a  population  of 
trainers  and  a  populauon  of  testers.”  Models  for  the  com 
ponents  of  vanance  in  both  paradigms  arc  the  subjects 
of  current  work  in  progress. A  key  point  of  this  latter 
work  IS  the  fact  that  for  computer-aided  diagnosis  most 
available  software  for  ROC  analysts  only  provides  estimates 
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of  uncertainty  that  are  generaiizable  to  a  populauon  of  test 
samples. 

In  this  investigation,  we  have  limited  our  siud>  to  onl\ 
three  types  of  classifiers:  the  linear  discnminani,  the  qua 
dratic  discriminant,  and  the  backpropagation  ANNs  with  one 
hidden  layer.  There  are,  of  course,  many  other  vanauons  of 
the  ANN  architecture  and  other  parametric  or  non>parametnc 
classifiers  available  for  feature  classification  tasks.  The  pur¬ 
pose  of  our  work  is  not  to  exhaustively  evaluate  all  possible 
combinations  of  class  distributions  and  classifiers.  Rather,  by 
limiting  our  investigation  to  some  well-known  situations,  w'c 
can  perform  systematic  analyses  and  gajn  some  insights  into 
the  classifier  design  problems.  Furthermore,  we  have  limned 
our  discussion  here  to  the  estimates  of  the  mean  classifier 
performance.  Wagner  et  have  investigated  the  vari¬ 

ances  of  classifier  performance  estimated  from  a  finite 
sample  set  and  developed  models  to  study  the  relative  im¬ 
portance  of  the  sizes  of  the  training  and  test  samples.  Ii  has 
been  demonstrated  that  a  components-of-variance  model  can 
be  csumaied  with  a  finite  sample  set  by  using  a  bootstrap 
method.  More  importantly,  the  analysis  of  variances  can  re¬ 
veal  the  gencralizability  of  the  performance  estimates  to 
other  training  and  lest  sample  sets  in  the  population.  Our 
long  term  goals  are  lo  find  some  guidelines  for  designing 
efficient  resampling  schemes  that  can  minimize  the  bias  and 
variance  of  a  trained  classifier  using  the  available  samples, 
and  to  provide  a  quantitative  design  tool  that  can  estimate  the 
design  sample  size  requirement  for  a  larger  ‘‘pivotal”  study 
Irom  the  results  of  a  smaller  “pilot”  study  in  order  to 
achieve  a  desired  precision  in  A.  and  the  desired  generaliz- 
ahihiN 
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Improvement  of  Radiologists' 
Characterization  of 
Mammographic  Masses  by 
Using  Computer-aided 
Diagnosis:  An  ROC  Study^ 


PURPOSE:  To  evaluate  the  effects  of  computer-aided  diagnosis  (CAD)  on  radiolo¬ 
gists'  classification  of  malignant  and  benign  masses  seen  on  mammograms. 

MATERIALS  AND  METHODS:  The  authors  previously  developed  an  automated 
computer  program  for  estimation  of  the  relative  malignancy  rating  of  masses.  In  the 
present  study,  the  authors  conducted  observer  performance  experiments  with 
receiver  operating  characteristic  (ROC)  methodology  to  evaluate  the  effects  of 
computer  estimates  on  radiologists'  confidence  ratings.  Six  radiologists  assessed 
biopsy-proved  masses  with  and  without  CAD.  Two  experiments,  one  with  a  single 
view  and  the  other  with  two  views,  were  conducted.  The  classification  accuracy  was 
quantified  by  using  the  area  under  the  ROC  curve,  A^. 

RESULTS:  For  the  reading  of  238  images,  the  Az  value  for  the  computer  classifier  was 
0.92.  The  radiologists'  A^  values  ranged  from  0.79  to  0.92  without  CAD  and 
improved  to  0.87-0.96  with  CAD.  For  the  reading  of  a  subset  of  76  paired  views,  the 
radiologists'  A^  values  ranged  from  0.88  to  0.95  without  CAD  and  improved  to 
0.93-0.97  with  CAD.  Improvements  in  the  reading  of  the  two  sets  of  images  were 
statistically  significant  (P  =  .022  and  .007,  respectively).  An  improved  positive 
predictive  value  as  a  function  of  the  false-negative  fraction  was  predicted  from  the 
improved  ROC  curves. 

CONCLUSION:  CAD  may  be  useful  for  assisting  radiologists  in  classification  of 
masses  and  thereby  potentially  help  reduce  unnecessary  biopsies. 


Breast  cancer  is  the  most  prevalent  non-skin  cancer  in  women;  178,700  new  cases  are 
estimated  to  have  occurred  in  1998  (1).  The  mortality  of  breast  cancer  is  the  second  highest 
among  all  cancer  deaths  in  women  (1).  At  present,  there  is  no  effective  method  to  prevent 
breast  cancer.  The  best  approach  to  reducing  the  breast  cancer  mortality  rate  is  early 
detection  and  treatment.  Because  the  mammographic  features  of  early-stage  breast  cancers 
are  not  very  specific,  the  need  for  high  detection  sensitivity  leads  to  biopsy  of  many 
low-suspicion  lesions.  The  positive  predictive  values  (PPVs)  of  mammographic  signs  are, 
therefore,  often  below  30%  (2,3). 

Computer-aided  diagnosis  (CAD)  is  considered  to  be  one  of  the  approaches  that  may 
improve  the  efficacy  of  mammography  (4).  With  CAD,  a  computerized  detection  algorithm 
alerts  a  radiologist  to  the  location  of  the  suspicious  lesions,  and/or  a  trained  computer 
classifier  provides  the  radiologist  with  an  estimate  of  the  likelihood  of  malignancy  of  a 
lesion.  The  radiologist  takes  into  consideration  the  information  provided  by  the  computer 
before  making  a  decision.  This  ''second  opinion"  may  improve  the  diagnostic  accuracy 
because  it  serves  as  a  form  of  double  reading  (5).  Furthermore,  a  computer  evaluation  is 
often  more  consistent  and  reproducible  than  a  human  decision  maker  (6). 

Considerable  research  has  been  devoted  to  the  development  of  computerized  schemes 
for  the  detection  and  classification  of  mammographic  abnormalities.  These  efforts  have 
advanced  the  CAD  technology  such  that  clinical  application  appears  to  be  possible  in  the 
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Figure  1.  Histograms  illustrate  the  distributions  of  (a)  size  (ie,  length  of  the  long  axis)  and  (b)  visibility  ranking  (1  =  obvious,  5  =  subtle)  of  the  253 
masses  included  in  the  data  set.  Because  classification  accuracy  depends  on  the  case  mix,  these  distributions  provided  some  information  on  the 
masses  in  the  data  set. 


near  future.  It  is,  therefore,  necessary  to 
evaluate  the  effects  of  CAD  on  radiolo¬ 
gists'  detection  and  diagnosis  of  mammo- 
graphic  lesions.  In  a  previous  receiver 
operating  characteristic  (ROC)  study,  we 
demonstrated  that  CAD  could  improve 
radiologists'  accuracy  in  the  detection  of 
subtle  microcalcifications  on  mammo¬ 
grams  (7).  Kegelmeyer  et  al  (8)  also  re¬ 
ported  an  improvement  in  radiologists' 
sensitivity  for  the  detection  of  spiculated 
masses  with  use  of  a  computer  aid.  For 
the  classification  of  mammographic  le¬ 
sions,  it  has  been  shown  that  a  computer 
classifier  that  estimated  the  likelihood  of 
malignancy  on  the  basis  of  mammographic 
features  extracted  by  radiologists  could  im¬ 
prove  radiologists'  accuracy  in  distinguish¬ 
ing  malignant  from  benign  lesions  (9-1 1). 

We  previously  conducted  ROC  studies 
to  compare  the  performance  of  radiolo¬ 
gists  with  that  of  the  computer  (12)  and 
to  compare  radiologists'  ability  to  classify 
masses  with  and  without  CAD  (13).  Jiang 
et  al  (14)  also  performed  an  ROC  study  of 
the  effect  of  CAD  on  radiologists'  perfor¬ 
mance  in  classifying  microcalcifications. 
The  results  of  all  of  these  observer  perfor¬ 
mance  studies  indicate  the  potential  to 
improve  mammographic  interpretation 
with  a  computer  aid. 

We  have  developed  an  automated 
method  to  analyze  masses  seen  on  mam¬ 
mograms  (15-17).  A  mass  is  segmented 
from  its  surrounding  breast  tissue,  and  an 
image  transformation  technique  is  used 
to  transform  the  mass  margin  from  the 
polar  coordinate  system  to  the  Cartesian 
coordinate  system.  A  linear  discriminant 
classifier  then  extracts  the  useful  texture 
features  from  the  transformed  image  and 


merges  them  into  a  relative  malignancy 
rating.  Our  approach  is  different  from 
others  that  use  a  trained  classifier  to 
merge  radiologist-extracted  image  fea¬ 
tures  or  feature  codes  by  using  the  Ameri¬ 
can  College  of  Radiology  Breast  Imaging 
Reporting  and  Database  System  lexicon 
(9-11).  Our  fully  automated  method  has 
the  advantage  that,  unlike  a  human 
reader,  it  does  not  have  variability  in 
feature  recognition  and  coding.  In  addi¬ 
tion,  the  computer  may  be  able  to  extract 
some  information,  such  as  texture  fea¬ 
tures,  that  may  not  be  readily  perceived 
by  human  eyes.  We  conducted  an  ROC 
study  to  evaluate  whether  this  computer  aid 
can  improve  radiologists'  performance  in 
the  classification  of  mammographic  masses 
(13).  The  results  of  our  observer  perfor¬ 
mance  study  are  described  in  this  article. 

Other  investigators  also  have  reported 
on  automated  algorithms  for  the  classifi¬ 
cation  of  mammographic  masses  (18-21). 
The  methods  used  in  these  algorithms 
varied,  and  their  accuracy  in  classifica¬ 
tion  cannot  be  compared  directly  because 
of  the  differences  in  the  data  sets.  How¬ 
ever,  the  effects  of  CAD  on  radiologists' 
performance  are  not  expected  to  depend 
strongly  on  the  specific  algorithm  if  differ¬ 
ent  computer  aids  of  comparable  accuracy 
are  used.  Therefore,  the  applications  of  the 
findings  of  this  study  should  not  be  limited 
to  our  computerized  classification  aid. 

MATERIALS  AND  METHODS 
Data  Set 

The  data  set  for  this  study  consisted  of 
253  mammograms  obtained  in  103  pa¬ 


tients.  Each  image  contained  a  biopsy- 
proved  mass  that  was  evaluated  in  this 
study.  Some  cases  involved  multiple  views 
or  images  from  multiple  examinations. 
The  cases  were  randomly  selected  from 
patient  files  from  the  breast  imaging  divi¬ 
sion  of  a  National  Cancer  Institute- 
designated  national  cancer  center  with 
the  approval  of  the  Institutional  Review 
Board.  The  PPV  of  masses  recommended 
for  biopsy  at  this  center  is  about  25%- 
30%,  but  an  approximately  equal  number 
of  malignant  and  benign  masses  (127  and 
126,  respectively)  were  chosen  to  en¬ 
hance  the  statistical  power  in  this  ob¬ 
server  performance  study.  Any  images 
that  were  judged  to  be  technically  poor 
were  excluded. 

The  mammograms  were  acquired  with 
a  contact  technique.  The  dedicated  mam¬ 
mographic  systems  had  a  molybdenum 
anode  and  molybdenum  filter,  a  0.3-mm 
nominal  focal  spot,  and  a  reciprocating 
grid.  MinR/MinR-E  screen-film  systems 
(Eastman-Kodak,  Rochester,  NY)  were 
used  with  these  units.  Sixty-two  of  the 
malignant  masses  and  six  of  the  benign 
masses  were  judged  to  be  spiculated  by  a 
radiologist  (M.A.H.)  experienced  in  mam¬ 
mography.  The  radiologist  also  measured 
the  size  (ie,  longest  dimension)  and 
ranked  the  visibility  of  the  masses  on  a 
scale  of  1  (obvious)  to  5  (subtle)  relative 
to  the  range  of  visibility  of  masses  encoun¬ 
tered  in  clinical  practice.  For  a  description 
of  the  masses  included  in  the  data  set, 
histograms  of  the  size  and  visibility  of  the 
masses  are  shown  in  Figures  la  and  lb, 
respectively. 

For  the  computer  analysis,  the  selected 
mammograms  were  digitized  with  a  laser 
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Figure  2.  Example  of  rubber-band-straightening  transform  for  extraction  of  texture  features  in  the  margin  region  surrounding  a  mass,  (a)  Original 
and  (b)  background-corrected  images  showing  the  region  of  interest  with  the  mass,  (c)  mammogram  showing  an  outline  of  the  segmented  mass,  and 
(d)  rubber-band-straightening-transformed  image  of  a  40-pixel-wide  region  surrounding  the  segmented  mass. 


imager  (Lumisys  DIS-1000,  Los  Altos,  Ca¬ 
lif)  at  a  pixel  size  of  0.1  x  0.1  mm  and 
12-bit  gray  levels.  This  imager  has  an 
optical  density  range  of  about  0.0-3. 5. 
The  optical  density  on  the  film  was  digi¬ 
tized  linearly  to  pixel  value  at  a  calibra¬ 
tion  of  0.001  optical  density  unit/pixel 
value  in  the  optical  density  range  of 
about  0.0-2.8.  The  digitizer  deviated  from 
a  linear  response  at  an  optical  density 
higher  than  2.8. 

For  the  observer  experiments,  we  used 
laser-printed  images  of  the  digitized  mam¬ 
mograms  for  all  readings.  The  images 
were  printed  with  a  969HQ  laser  imager 
(Imation,  Oakdale,  Minn)  that  was  con¬ 
nected  to  a  Macintosh  computer  (Apple 
Computer,  Cupertino,  Calif)  through  a 
special  digital  interface.  The  interface  pro¬ 
vided  a  12-bit  in,  10-bit  out  look-up  table 
and  allowed  images  to  be  scaled  to  differ¬ 
ent  factors  with  15  interpolation  meth¬ 
ods.  Because  this  laser  imager  has  a  pixel 
size  of  about  0.085  mm,  we  enlarged  the 
images  by  about  18%  during  printing  to 
maintain  them  at  the  same  size  as  the 
original  mammograms.  One  of  the  inter¬ 
polation  methods  was  chosen  by  an  expe¬ 
rienced  radiologist  (M.A.H.),  who  in¬ 
spected  the  printed  images  with  a 
magnifier  and  evaluated  the  sharpness  of 
the  spicules  and  mass  boundaries.  Be¬ 
cause  of  the  small  pixel  size  used  for  both 


digitization  and  printing,  basically  no 
noticeable  blurring  of  the  masses  could 
be  seen  with  the  chosen  interpolation 
method.  The  images  were  also  inspected 
for  the  potential  contouring  effect  of 
10-bit  output  images,  but  no  noticeable 
artifacts  could  be  found.  A  linear  pixel 
value-to-output  optical  density  calibra¬ 
tion  curve  of  the  laser  imager  was  used  for 
the  printing.  All  images  were  printed 
with  the  same  settings. 

Computerized  Classification 
of  Masses 

Our  computerized  method  of  classify¬ 
ing  mammographic  masses  has  been  de¬ 
scribed  in  detail  previously  (15-17).  The 
method  is  summarized  as  follows:  A  re¬ 
gion  of  interest  that  contained  the  biopsy- 
proved  mass  was  identified  on  the  mam¬ 
mogram  by  the  radiologist.  Background 
correction  based  on  a  distance-weighted 
estimation  method  was  applied  to  the 
region  of  interest  to  reduce  the  low- 
frequency  density  variation  in  the  region. 
A  median-filtered  smoothed  image  and 
two  high-frequency  enhanced  images 
were  generated  from  the  background- 
corrected  region  of  interest.  The  smoothed 
and  enhanced  gray-level  values  at  each 
pixel  were  used  as  features  in  a  k-means 
clustering  algorithm  to  classify  the  pixels 


into  two  clusters;  one  was  the  mass,  and 
the  other  was  the  surrounding  breast 
tissue  background.  By  choosing  an  appro¬ 
priate  criterion,  a  mass  region  slightly 
smaller  than  the  actual  mass  that  was 
visible  on  the  image  was  segmented. 

The  boundary  of  the  segmented  region 
was  smoothed  by  morphologic  filtering. 
A  new  image  transformation  technique, 
referred  to  as  the  rubber-band-straighten¬ 
ing  transform,  was  used  to  transform  a 
40-pixel-wide  region  that  surrounded  the 
segmented  mass  boundary  into  a  rectan¬ 
gular  region.  After  transformation,  the 
mass  margin  became  approximately  par¬ 
allel,  and  any  spicules  that  were  radiating 
from  the  mass  became  approximately  per¬ 
pendicular,  to  the  long  dimension  of  the 
rectangular  region.  The  rubber-band¬ 
straightening  transform  enabled  the  spic¬ 
ules  to  be  aligned  approximately  in  a 
uniform  direction  and  thus  facilitated  the 
extraction  of  texture  features  from  the 
margin  of  the  mass.  An  example  of  a 
rubber-band-straightening-transformed 
image  is  shown  in  Figure  2. 

Two  types  of  texture  features  were 
found  to  be  useful  for  classification.  The 
first  set  of  features  included  eight  texture 
measures  derived  from  the  spatial  gray- 
level  dependence  matrices  of  the  rubber- 
band-straightening-transformed  image.  A 
spatial  gray-level  dependence  matrix  ele- 
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Figure  3.  Histogram  of  the  test  discriminant  scores  of  the  253  masses 
obtained  from  the  linear  discriminant  classifier  by  using  a  "leave  one 
case  out"  training  and  test  resampling  scheme.  For  this  classifier,  a 
smaller  discriminant  score  corresponded  to  a  higher  likelihood  of 
malignancy.  The  discriminant  scores  were  used  as  the  decision 
variable  in  the  ROC  analysis  of  classification  performance. 


Figure  4.  Binormal  distribution  fitted  to  the  histogram  of  the 
discriminant  scores  of  the  malignant  and  benign  masses.  The  discrim¬ 
inant  scores  were  linearly  transformed  into  a  relative  malignancy 
rating  ranging  from  1  to  10,  where  1  corresponded  to  the  most  benign 
rating  and  10  corresponded  to  the  most  malignant  rating.  This 
binormal  distribution  was  shown  to  the  observers  during  the  training 
session  to  explain  the  rating  scale  of  the  computer  classifier. 


merit  po  dOV)  is  the  joint  probability  of  the 
occurrence  of  gray  levels  /  and  j  for  pixel 
pairs  that  are  separated  by  a  distance  d 
and  at  a  direction  0  (22).  For  analysis  of 
the  masses,  the  spatial  gray-level  depen¬ 
dence  matrices  were  constructed  for  10 
pixel  distances  {d  -  1,  2,  3,  4,  6,  8,  10, 12, 
16,  20  pixels)  and  in  four  directions  (0°, 
45^  90°,  135°)  relative  to  the  mass  bound¬ 
ary.  Therefore,  a  total  of  320  spatial  gray- 
level  dependence  texture  features  were 
extracted. 

The  second  set  of  texture  features  was 
derived  from  the  run  length  statistics 
matrices  of  the  horizontal  and  vertical 
gradient  images  of  the  rubber-band- 
straightening-transformed  margin  region. 
Five  texture  measures  were  extracted  from 
the  run  length  statistics  matrix  in  each  of 
the  two  directions  (0°  or  90°)  on  each 
gradient  image.  A  total  of  20  run  length 
statistics  texture  features  were  thus  ob¬ 
tained.  Therefore,  we  had  a  total  of  340 
features  from  the  two  types  of  texture 
measures. 

A  stepwise  linear  discriminant  feature 
selection  procedure  (23)  was  used  to  se¬ 
lect  the  most  effective  features  from  the 
available  feature  set.  A  total  of  41  features 
were  selected.  The  selected  features  were 
input  into  the  Fischer  linear  discriminant 
classifier  (24)  as  predictor  variables.  A 
"leave  one  case  out"  resampling  scheme 
was  used  to  train  and  test  the  classifier.  A 
histogram  illustrating  the  test  discrimi¬ 
nant  scores  of  the  253  masses  is  shown  in 
Figure  3.  For  this  classifier,  a  smaller  dis¬ 
criminant  score  corresponded  to  a  higher 
likelihood  of  malignancy.  By  using  the 
test  discriminant  score  as  the  decision 
variable,  the  performance  of  the  com¬ 
puter  classifier  could  be  evaluated  by  us¬ 


ing  ROC  analysis  (17,25,26)  and  com¬ 
pared  with  that  of  the  radiologists,  as 
described  later. 

Relative  Malignancy  Rating 
of  the  Masses 

For  the  observer  performance  study,  we 
provided  a  relative  malignancy  rating  of 
each  mass  to  the  observer  during  the 
reading  session  with  CAD.  The  relative 
malignancy  rating  was  obtained  by  tak¬ 
ing  a  linear  transformation  of  the  com¬ 
puter  classifier's  decision  variable  to  a 
range  of  1-10  and  rounding  the  value  to 
the  nearest  integer.  The  transformation 
also  reversed  the  relative  magnitude  of 
the  decision  variables  so  that  1  corre¬ 
sponded  to  the  highest  benignity  rating, 
and  10  corresponded  to  the  highest  malig¬ 
nancy  rating. 

The  purpose  of  the  transformation  was 
to  provide  a  simple  and  intuitive  relative 
scale  for  the  observer.  Because  the  trans¬ 
formation  was  linear  and  monotonic,  the 
distributions  of  the  normal  and  abnormal 
samples,  as  well  as  their  ROC  curves,  were 
not  affected,  with  the  exception  of  a 
small  error  caused  by  making  the  deci¬ 
sion  variables  discrete.  Furthermore,  the 
slope  a  and  intercept  b  parameters  that 
were  fitted  to  the  transformed  discrimi¬ 
nant  scores  for  the  normal  and  abnormal 
samples  by  using  the  labroc  program  (26) 
were  used  to  generate  a  binormal  distribu¬ 
tion.  The  fitted  binormal  distribution  with 
the  relative  malignancy  rating  on  a  1-10 
scale  (Fig  4),  together  with  the  computer's 
ROC  curve,  were  shown  and  explained  to 
the  observers  during  a  training  session. 


Observer  Performance  Study 

Two  ROC  experiments  (27)  were  con¬ 
ducted:  The  masses  were  evaluated  from  a 
single  view  in  the  first  experiment  and 
from  two  views  in  the  second  experi¬ 
ment.  The  location  of  the  biopsy-proved 
mass  was  marked  on  each  image  so  that 
the  correct  mass  was  evaluated  by  all 
observers.  The  observers  were  instructed 
to  ignore  any  other  possible  masses  on 
the  images.  Six  radiologists  (M.A.H., 
M.A.R.,  T.E.W.,  D.D.A.,  C.P.,  J.S.N.)  who 
are  approved  by  the  Mammography  Qual¬ 
ity  Standards  Act  and  have  7-20  years  of 
experience  in  interpreting  mammograms 
participated  in  the  observer  performance 
experiments. 

There  were  two  reading  sessions  in 
each  experiment — one  with  CAD  and  the 
other  without  CAD.  The  observers  were 
asked  to  rate  the  likelihood  of  malig¬ 
nancy  of  the  masses  on  a  10-point  confi¬ 
dence  rating  scale  under  all  reading  condi¬ 
tions.  In  the  first  session,  half  the 
observers  interpreted  the  images  without 
CAD,  and  the  other  half  interpreted  them 
with  CAD.  The  two  reading  sessions  in 
the  same  experiment  were  separated  by  at 
least  3  weeks,  and  the  two  experiments 
were  separated  by  6  months.  For  all  four 
reading  sessions,  the  observer  had  unlim¬ 
ited  time  to  read  each  case.  To  estimate 
the  average  reading  time  per  case  for  each 
observer,  the  reading  time  for  each  case 
was  recorded  by  using  a  stopwatch. 

In  the  first  experiment,  the  data  set  of 
253  single-view  mammograms  was  di¬ 
vided  into  a  training  set  of  15  mammo¬ 
grams  and  a  study  set  of  238  mammo- 
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Figure  5.  ROC  curve  for  computerized  classi¬ 
fication  of  the  238  masses  used  in  the  observer 
performance  study  with  single-view  reading. 
The  computer's  ROC  curve  can  be  compared 
with  the  radiologists'  ROC  curves  obtained 
from  the  single-view  reading  experiment  illus¬ 
trated  in  Figures  6  and  8. 


grams  (117  benign,  121  malignant).  In 
each  reading  session,  training  was  con¬ 
ducted  before  the  reading  of  the  study 
images.  For  the  reading  session  with  CAD, 
the  fitted  binormal  distributions  of  the 
computer  rating  scores  (Fig  4)  for  the 
entire  data  set  were  explained  to  the 
observer  during  training  to  familiarize 
the  observer  with  the  computer's  rating 
scale.  The  computer  rating  of  the  mass 
was  displayed  on  each  image.  After  read¬ 
ing  each  training  image,  the  observer  was 
told  the  results  of  biopsy  of  the  mass. 

Each  observer  read  the  entire  data  set  in 
one  reading  session.  The  order  of  the 
study  images  was  randomized  by  a  ran¬ 
dom  number  generator.  The  random  se¬ 
quence  was  different  for  each  observer 
and  for  each  reading  session  by  the  same 
observer.  For  the  reading  session  with 
CAD,  the  observer  was  free  to  look  at  the 
computer  rating,  which  was  displayed  on 
the  image,  either  before  or  after  estimat¬ 
ing  the  likelihood  of  malignancy  of  the 
mass.  However,  each  observer  was  asked 
to  always  read  the  computer  rating  before 
making  a  final  decision.  The  observer  was 
not  informed  of  the  pathologic  results  of 
any  mass  on  the  study  images. 

The  second  experiment  was  very  simi¬ 
lar  to  the  first  experiment.  From  the  238 
single-view  mammograms,  76  matched 
pairs  (37  benign,  39  malignant)  of  cranio- 
caudal  and  mediolateral  oblique  or  lateral 
views  were  found.  Another  six  pairs  of 
two-view  mammograms  were  identified 
from  the  rest  of  the  images  and  used  as 
training  cases.  The  remaining  mammo¬ 
grams  were  either  single-view  images  or 
additional  views  of  the  pairs  already  cho- 
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sen,  so  they  were  not  used  in  this  experi¬ 
ment.  In  this  experiment,  the  observers 
were  not  informed  of  the  pathologic  re¬ 
sults  of  any  study  case  in  any  reading 
session.  The  76  pairs  of  mammograms 
were  read  in  one  reading  session  by  each 
observer. 

For  the  reading  session  with  CAD,  the 
rating  of  the  mass  in  each  view  was 
displayed  on  the  respective  image.  The 
computer  ratings  of  the  mass  on  the  two 
views  were  generally  different.  It  was  up 
to  the  observer  to  decide  how  to  merge 
the  two-view  information.  Observers  were 
asked  to  give  a  single  rating  of  the  mass 
after  reading  both  views. 

ROC  Analysis 

The  confidence  ratings  of  each  ob¬ 
server  obtained  from  each  reading  condi¬ 
tion  were  analyzed  by  using  ROC  method¬ 
ology,  and  the  classification  accuracy  was 
quantified  by  using  the  area  under  the 
ROC  curve,  A^.  A  maximum  likelihood 
estimation  of  the  binormal  distribution 
was  fitted  to  the  confidence  ratings  by 
using  the  labroc  program.  This  program 
provides  an  estimate  of  the  and  of  the  a 
and  b  parameters  of  the  ROC  curve.  The 
statistical  significance  of  the  difference  in 
Az  between  the  reading  with  CAD  and 
that  without  CAD  was  estimated  with 
two  methods:  One  was  the  Student  paired 
t  test  for  observer-specific  paired  data;  the 
other  was  the  Dorfman-Berbaum-Metz 
method  for  analysis  of  multireader,  multi¬ 
case  ROC  data  (28).  The  statistical  signifi¬ 
cance  of  the  difference  in  A^for  reading 
single-view  and  two-view  mammograms 
was  estimated  by  using  the  Student  paired 
t  test  for  the  six  observers.  The  Student 
paired  ttest  takes  into  account  the  statisti¬ 
cal  variation  of  readers,  whereas  the  Dorf- 
man-Berbaum-Metz  method  considers 
both  reader  variation  and  case  sample 
variation  by  means  of  an  analysis  of  vari¬ 
ance  approach.  Therefore,  the  results  of 
Dorfman-Berbaum-Metz  analysis  can  be 
generalized  to  the  population  of  readers 
as  well  as  to  the  population  of  case 
samples. 


Positive  Predictive  Value 

An  ROC  curve  represents  the  entire 
range  of  operating  conditions  of  a  diag¬ 
nostic  process  and  is  independent  of  dis¬ 
ease  prevalence.  When  the  disease  preva¬ 
lence  is  known,  any  operating  point  on 
an  ROC  curve  can  be  used  to  derive  the 
PPV  and  the  corresponding  false-negative 
fraction  (false-negative  fraction  =  1  - 


true-positive  fraction)  on  the  basis  of  the 
following  relationship:  PPV  =  TPF  x  P(M)/ 
[TPF  X  P(M)  +  FPF  X  P(B)],  where  TPF  is 
the  true-positive  fraction,  FPF  is  the  false¬ 
positive  fraction  at  the  chosen  decision 
threshold,  and  P(M)  and  P(B)  are  the 
prevalences  of  malignant  and  benign 
cases,  respectively.  By  varying  the  deci¬ 
sion  threshold,  the  dependence  of  the 
PPV  on  the  false-negative  fraction  can  be 
derived. 

Because  our  data  set  did  not  include 
masses  on  which  biopsy  had  not  been 
performed,  the  ROC  curves  obtained  in 
this  study  cannot  be  generalized  to  pre¬ 
dict  the  performance  of  the  computer 
classifier  and  radiologists  in  clinical  prac¬ 
tice.  However,  to  demonstrate  the  pos¬ 
sible  effect  of  CAD  on  the  PPV  in  the 
population  of  masses  in  which  biopsy  is 
likely  to  be  performed  under  the  current 
clinical  criteria,  we  can  estimate  the  PPV 
by  using  the  prevalence  of  the  malignant 
and  benign  masses  in  this  patient  group. 
Because  the  PPV  of  masses  sent  for  biopsy 
ranges  from  about  25%  to  44%  in  general 
and  from  about  25%  to  30%  at  our  institu¬ 
tion,  for  the  purposes  of  our  estimation, 
we  assumed  that  the  P(M)  was  25%  and 
the  P(B)  was  75%  in  this  population.  A 
higher  prevalence  of  malignant  cases 
would  cause  an  increase  in  the  PPV,  but 
the  trend  between  the  PPV  curves  with 
and  without  CAD  would  be  similar. 


RESULTS 


The  ROC  curve  illustrating  the  perfor¬ 
mance  of  the  computer  classifier  for  the 
238  study  mammograms  is  shown  in 
Figure  5.  The  ROC  curve  for  the  entire  set 
of  253  mammograms  (not  shown)  was 
almost  identical  to  that  of  the  238  study 
cases;  this  indicates  that  the  15  training 
cases  were  typical  of  the  238  cases  used  in 
the  study.  The  A^  values  (±  SD)  for  both 
ROC  curves  were  0.92  ±  0.02. 

For  the  first  experiment  of  reading  the 
238  single-view  mammograms,  the  ROC 
curves  for  the  readings  by  the  six  radiolo¬ 
gists  both  without  and  with  CAD  are 
shown  in  Figures  6a  and  6b,  respectively. 
The  Az  values  of  the  six  radiologists  for 
the  readings  with  and  without  CAD  are 
listed  in  Table  1. 

For  the  second  experiment  of  reading 
the  76  pairs  of  two-view  mammograms, 
the  ROC  curves  for  the  readings  by  the  six 
radiologists  both  without  and  with  CAD 
are  shown  in  Figures  7a  and  Figure  7b, 
respectively.  The  A^  values  of  the  six 
radiologists  in  this  experiment  are  also 
listed  in  Table  1. 
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Figure  6.  ROC  curves  for  the  six  observers  for  single-view  reading  of  the  masses  (a)  without  CAD  and  (b)  with  CAD.  (a,  b)Rl  =  reader  1,  R2  -  reader 
2,  R3  =  reader  3,  R4  =  reader  4,  R5  =  reader  5,  R6  =  reader  6.  Five  of  the  six  observers  achieved  an  increase  in  the  area  under  the  ROC  curve  A,  with 


TABLE  1 

Areas  under  the  ROC  Curves  for  the  Classification  of  Masses  with  and  without 
CAD  by  the  Six  Radiologists 


Az  (Single  View)*  (Two  View)^ 


Radiologist 

No. 

Without 

CAD 

With 

CAD 

Without 

CAD 

With 

CAD 

1 

0.84  ±  0.03 

0.87  ±  0.02 

0.90  ±  0.03 

0.93  ±  0.03 

2 

0.92  ±  0.02 

0.96  ±  0.01 

0.95  ±  0.02 

0.97  ±  0.02 

3 

0.86  ±  0.02 

0.91  ±  0.02 

0.92  ±  0.03 

0.93  ±  0.03 

4 

0.79  ±  0.03 

0.87  ±  0.02 

0.88  ±  0.04 

0.95  ±  0.03 

5 

0.86  ±  0.02 

0.92  ±  0.02 

0.93  ±  0.03 

0.97  ±  0.02 

6 

A^  from  average  a,  b 

0.89  i  0.02 

0.87  ±  0.02 

0.89  ±  0.04 

0.93  ±  0.03 

parameters 

0.87 

0.91 

0.92 

0.96 

Note. — Data  are  the  mean 

±  SD. 

*  P  =  .022  for  the  difference  between  the  A,  values  measured  with  CAD  and  those  measured 
without  CAD,  as  determined  by  using  the  Student  two-tailed  t  test.  P  =  .020  for  this  difference,  as 
determined  by  using  the  Dorfman-Berbaum-Metz  method. 

P  =  .007  for  the  difference  between  values  measured  with  CAD  and  those  measured  without 
CAD,  as  determined  by  using  the  Student  two-tailed  t  test.  P  ^  .026  for  this  difference,  as 
determined  by  using  the  Dorfman-Berbaum-Metz  method. 


The  average  ROC  curve  was  derived 
from  the  average  a  and  b  parameters  of 
the  six  individual  ROC  curves  for  a  given 
reading  condition  (27).  The  average  ROC 
curves  for  the  four  reading  conditions  are 
shown  in  Figure  8.  The  A^  values  of  the 
average  ROC  curves  are  listed  in  Table  1. 

For  the  reading  of  the  single-view  mam¬ 
mograms,  the  performance  of  the  com¬ 
puter  classifier  was  comparable  to  that  of 
the  radiologist  (reader  2)  who  had  the 
highest  classification  accuracy  (compare 
Figs  5  and  6)  and  higher  than  the  average 
performance  of  the  six  radiologists  (com¬ 
pare  Figs  5  and  8).  When  the  radiologists 
read  the  images  with  the  computer  aid, 
the  classification  accuracy  of  five  radiolo¬ 
gists  improved  (Table  1);  the  improve¬ 
ment  in  their  A^  values  ranged  from  0.04 
to  0.08.  The  average  performance  of  the 
six  radiologists  became  comparable  to 
that  of  the  computer  classifier.  The  im¬ 
provement  in  the  radiologists'  classifica¬ 
tion  accuracy  by  using  CAD  was  statisti¬ 
cally  significant  {P  =  .022,  Student  paired 
t  test;  P  =  .020,  Dorfman-Berbaum-Metz 
method).  Reader  2  with  CAD  obtained  an 
Az  value  of  0.96,  which  was  higher  than 
that  obtained  by  the  radiologist  alone  or 
by  the  computer  alone. 

A  trend  similar  to  that  with  the  single¬ 
view  readings  was  observed  with  the  two- 
view  readings.  The  A^  value  of  the  com¬ 
puter  classifier  for  the  corresponding  152 


single- view  masses  was  0.91  ±  0.02.  The 
classification  accuracy  of  all  six  radiolo¬ 
gists  improved  when  they  read  the  mam¬ 
mograms  with  the  computer  aid.  The 
increase  in  the  A^  values  ranged  from  0.01 
to  0.07.  The  improvement  was  statisti¬ 
cally  significant  (P  =  .007,  Student  paired 
t  test;  P  -  .026,  Dorfman-Berbaum-Metz 
method).  With  CAD,  two  radiologists 
achieved  an  A^  value  of  0.97,  which  was 
higher  than  that  obtained  by  the  radiolo¬ 


gists  alone  or  by  the  computer  alone. 
These  results  indicate  that  the  second 
opinion  provided  by  the  computer  classi¬ 
fier  might  have  strengthened  the  radiolo¬ 
gists'  confidence  in  the  interpretation  of 
some  difficult  cases  but  had  less  influence 
on  the  radiologists'  decision  when  the 
computer  made  mistakes  or  when  the 
radiologists  were  confident  about  their 
decision. 

As  can  be  seen  from  the  data  in  Table  1, 
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Figure  7,  ROC  curves  for  the  six  observers  for  two- view  reading  of  the  masses  (a)  without  CAD  and  (b)  with  CAD.  (a,  h)Rl  -  reader  I,  R2  -  reader  2, 
R3  =  reader  3,  R4  =  reader  4,  R5  =  reader  5,  R6  =  reader  6.  All  six  observers  achieved  an  increase  in  the  area  under  the  ROC  curve,  A^,  with  CAD. 
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Figure  8.  Average  ROC  curve  obtained  from  the  average  a  and  b 
parameters  of  the  six  individual  ROC  curves  for  each  of  the  four 
reading  conditions.  An  improved  ROC  curve  was  achieved  with  CAD 
in  both  the  single-view  and  two-view  reading  experiments. 


the  radiologists'  accuracy  in  classifying 
masses  by  reading  two-view  mammo¬ 
grams  was  consistently  higher  than  that 
by  reading  single-view  mammograms  (P  = 
.008).  This  trend  remained  when  they 
read  the  mammograms  with  CAD  (P  = 
.007).  These  findings  are  consistent  with 


the  clinical  experience  of  the  radiologists 
that  at  least  two  views  of  mammograms 
are  needed  to  effectively  evaluate  a  suspi¬ 
cious  lesion. 

The  PPV  as  a  function  of  the  false¬ 
negative  fraction  was  derived  from  the 
fitted  ROC  curves  under  the  assumption 


that  the  prevalence  of  malignant  masses 
was  25%  in  the  population  of  masses  sent 
for  biopsy.  The  PPVs  estimated  for  the  six 
observers  who  read  the  two-view  mammo¬ 
grams  with  and  without  CAD  are  plotted 
in  Figure  9.  CAD  would  provide  an  im¬ 
provement  in  the  PPV  in  the  high  false¬ 
negative  fraction  range  for  all  observers 
except  readers  2  and  5.  The  increase  in 
the  PPV  at  a  decision  threshold  of  ''no 
missed  malignant  mass"  (ie,  false-nega¬ 
tive  fraction  =  0)  varied  over  a  wide 
range;  the  largest  gain,  39%,  would  be 
achieved  by  reader  2,  and  the  smallest 
gain,  0%,  would  be  achieved  by  reader  4. 


DISCUSSION 


In  the  observer  experiment  of  reading 
two-view  mammograms  with  CAD,  we 
presented  the  computer's  rating  of  each 
view  separately.  The  decision  of  how  to 
merge  the  computer  ratings  of  the  two 
views  was  left  to  the  radiologist.  It  is  likely 
that  the  radiologists  took  the  conserva¬ 
tive  approach  of  using  the  highest  malig¬ 
nancy  rating  of  the  two  as  the  computer's 
overall  rating.  However,  it  also  might 
have  depended  on  whether  the  relative 
ranking  between  the  two  computer  ratr 
ings  agreed  with  the  observer's  opinion. 
In  some  cases,  we  observed  that  the  radi¬ 
ologist's  rating  was  very  different  from 
the  computer's  rating  of  either  view. 
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Figure  9.  PPV  as  a  function  of  the  false-negative  fraction  derived  from  the  ROC  curves  for  the  six  observers  (Fig  7).  The  PPV  was  predicted  for  a 
population  of  masses  in  which  biopsy  was  likely  to  be  performed  under  current  clinical  criteria  and  by  assuming  the  prevalence  of  malignant  masses 
to  be  25%.  R1  -  reader  \,R2  =  reader  2,  R3  =  reader  3,  R4  -  reader  4,  R5  =  reader  5,  R6  =  reader  6. 


Because  decision  making  is  a  complex 
process,  the  simple  approach  of  using  the 
highest  malignant  rating  or  the  average 
rating  from  multiple  views  may  not  be  the 
method  preferred  by  radiologists.  The  sepa¬ 
rate  ratings  that  we  used  in  this  study  would 
provide  less  biased  information.  Further  in¬ 
vestigation  is  needed  to  determine  the  best 
approach  of  presenting  the  computer's  rat¬ 
ings  to  radiologists  in  clinical  practice. 

To  obtain  insight  into  how  the  radiolo¬ 
gists  might  use  the  two-view  informa¬ 
tion,  we  compared  the  classification  re¬ 
sults  from  their  true  two-view  reading 
with  those  from  a  simulated  two-view 
reading  without  the  computer  aid.  The 
latter  results  were  derived  from  ratings  of 
single-view  readings  of  the  same  76  pairs 
of  mammograms  interpreted  in  experi¬ 
ment  2  by  assuming  two  strategies — one 
in  which  the  highest  malignancy  rating 
between  the  two  ratings  was  used,  and 
the  other  in  which  the  average  of  the  two 
ratings  was  used  (Table  2).  The  A,  values 
for  these  classification  ratings  derived 
from  the  single-view  reading  are  listed  in 
Table  2.  The  corresponding  values  for 
the  computer  classifier  are  also  given  in 
Table  2  for  comparison. 


The  Aj,  values  for  the  maximal  rating 
and  the  average  rating  were  similar.  Four 
of  the  radiologists  obtained  higher  A^ 
values  at  the  true  two-view  reading;  the 
A^  values  obtained  by  the  remaining  two 
radiologists  were  lower  than  those  ob¬ 
tained  at  the  simulated  two-view  reading. 
Although  the  difference  did  not  achieve 
statistical  significance  (P  =  .37)  and  both 
readings  included  intraobserver  varia¬ 
tions,  there  seemed  to  be  a  slight  trend 
toward  the  true  two-view  reading  being 
more  accurate  than  the  simulated  two- 
view  reading.  This  may  indicate  that  the 
radiologists  used  a  more  complex  deci¬ 
sion-making  process  to  interpret  the  two 
views  of  the  masses  than  that  of  simply 
maximizing  or  averaging  the  ratings  from 
each  view. 

In  this  study,  the  discriminant  scores  of 
the  masses  given  by  the  computer  classi¬ 
fier  were  transformed  into  a  relative  malig¬ 
nancy  rating.  The  relative  malignancy 
rating  scale  and  the  distribution  of  the 
malignant  and  benign  masses  along  the 
relative  rating  scale  were  explained  to  the 
observers  in  the  training  sessions.  A  rela¬ 
tive  malignancy  rating  scale  was  used 
because  the  true  likelihood  of  malig¬ 


TABLE  2 

Estimation  of  the  Malignancy 
Classification  of  76  Masses  by 
Two- View  Reading,  as  Simulated  from 
Single-View  Reading  of 
Mammograms  by  Radiologists 
without  CAD 


Radiologist 

No. 

Maximal 

Rating 

Average 

Rating 

1 

0.94  ±  0.03 

0.93  ±  0.03 

2 

0.94  ±  0.03 

0.94  ±  0.03 

3 

0.84  ±  0.05 

0.86  ±  0.04 

4 

0.85  ±  0.04 

0.83  ±  0.05 

5 

0.88  ±  0.04 

0.89  ±  0.04 

6 

0.91  ±  0.03 

0.92  ±  0.03 

Computer 

0.96  ±  0.02 

0.96  ±  0.02 

Note. — Data  are  the  mean  ± 

SD.  Two  strate- 

gies  were  used:  In  one,  the  highest  of  the 
malignancy  ratings  on  each  view  was  used;  in 
the  other,  the  average  between  the  two  rat¬ 
ings  was  used. 


nancy  of  the  masses  could  not  be  esti¬ 
mated  from  a  small  data  set,  as  will  be 
explained.  However,  the  relative  rating 
scale  provided  by  the  computer  was  ad- 
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Figure  10.  Histograms  illustrate  the  confidence  ratings  of  reader  5  obtained  by  reading  76  two-view  mammograms  (a)  without  CAD  and  (b)  with 
CAD.  The  specificity  of  reader  5  at  100%  sensitivity  would  increase  from  5%  (two  of  37  masses)  without  CAD  to  68%  (25  of  37  masses)  with  CAD  if  an 
appropriate  decision  threshold  were  chosen. 


equate  for  measuring  the  relative  perfor¬ 
mance  of  classification  with  and  without 
CAD  in  an  ROC  study. 

If  a  computer  classifier  is  trained  and 
tested  with  very  large  data  sets,  and  if 
both  the  malignant  and  benign  cases 
represent  random  samples  of  the  popula¬ 
tion,  then  the  likelihood  of  malignancy 
of  a  classified  mass  can  be  estimated  on 
the  basis  of  the  probability  distributions 
of  the  classifier's  test  output  scores  and 
the  prevalence  of  the  two  classes  of  masses 
in  the  patient  population.  However,  with 
a  relatively  small  data  set,  such  as  that 
used  in  this  and  other  observer  studies 
(14),  there  are  limitations.  First,  the  perfor¬ 
mance  of  a  classifier  trained  with  a  small 
sample  set  may  have  large  bias  and  vari¬ 
ance  (29-31).  Second,  the  data  set  in  this 
study  did  not  include  masses  on  which 
biopsy  was  not  performed,  so  it  did  not 
represent  a  random  sample  of  the  masses 
in  the  patient  population.  If  our  classifier 
were  applied  to  all  cases  of  solid  masses  in 
clinical  practice,  the  probability  distribu¬ 
tion  of  the  test  scores  for  the  two  classes 
of  masses  would  be  different  from  that  of 
the  current  data  set. 

If  we  ignore  the  patient  population  at 
large,  it  is  possible  to  estimate  the  likeli¬ 
hood  of  malignancy  of  a  mass  on  the 
basis  of  the  probability  distribution  of  the 
classifier  output  scores  by  using  the  preva¬ 
lence  of  the  two  classes  of  masses  in  this 
specific  data  set.  However,  the  likelihood 
of  malignancy  derived  in  this  way  will  be 
completely  different  from  the  true  likeli¬ 
hood  of  malignancy  of  a  mass  in  the 
patient  population.  This  can  be  easily 
seen  if  one  considers  that  the  same  mass 
with  the  same  discriminant  score  will 
have  a  smaller  likelihood  of  malignancy 


if  it  is  analyzed  within  a  data  set  that  has  a 
lower  prevalence  of  malignant  cases  than 
that  in  the  current  data  set. 

Training  the  participating  radiologists 
with  a  ''likelihood  of  malignancy"  de¬ 
rived  from  a  small  data  set  for  the  ob¬ 
server  experiment  may  mislead  them  if 
they  encounter  a  similar  mass  in  their 
clinical  practice.  We,  therefore,  preferred 
to  use  a  "relative  malignancy  rating," 
which  is  independent  of  the  prevalences 
of  malignant  and  benign  masses  in  the 
data  set.  As  long  as  the  same  classifier  and 
the  same  linear  transformation  are  used 
for  classifying  masses,  the  relative  malig¬ 
nancy  rating  for  a  given  mass  will  remain 
the  same,  regardless  of  the  types  of  other 
masses  in  the  data  set.  When  a  computer 
classifier  is  implemented  in  a  clinical 
setting  and  its  performance  can  be  estab¬ 
lished  in  the  patient  population,  the  true 
likelihood  of  malignancy  of  a  given  mass 
can  be  estimated  and  provided  to  the 
radiologist.  The  true  likelihood  of  malig¬ 
nancy  may  be  a  more  informative  mea¬ 
sure  for  radiologists  in  the  clinical  applica¬ 
tion  of  CAD. 

For  the  reading  of  the  76  two- view 
mammograms,  the  results  of  the  ROC 
study  indicated  an  improvement  in  the 

value  for  all  six  radiologists  when  the 
computer  aid  was  used.  This  indicates  an 
overall  increase  in  the  separation  of  confi¬ 
dence  rating  distributions  between  the 
malignant  and  benign  cases.  The  histo¬ 
grams  in  Figure  10  illustrate  the  distribu¬ 
tions  of  confidence  ratings  with  and  with¬ 
out  CAD  for  reader  5,  who  achieved  the 
second  greatest  improvement  in  both  the 

value  (Table  1)  and  the  separation  of 
malignant  from  benign  distributions. 
Without  CAD,  this  reader's  ratings  of  the 


malignant  cases  ranged  from  2  to  10.  This 
is  consistent  with  the  fact  that  biopsy  was 
performed  in  all  masses  in  the  data  set  to 
avoid  missing  the  malignant  cases.  With 
CAD,  there  was  marked  improvement  in 
the  separation  of  the  two  distributions.  It 
is  possible  to  set  a  decision  threshold  at  a 
confidence  rating  of  4,  below  which  bi¬ 
opsy  would  not  need  to  be  performed  and 
no  malignant  masses  would  be  missed. 
The  number  of  benign  masses  that  could 
be  identified  without  missing  a  malig¬ 
nant  mass  by  setting  an  appropriate 
threshold  would  increase  by  23  (out  of  76 
cases)  for  reader  5.  Five  of  the  six  radiolo¬ 
gists  in  our  ROC  study  achieved  an  im¬ 
provement  in  distinguishing  benign  from 
malignant  masses,  and  one  radiologist 
had  no  difference.  Although  the  improve¬ 
ment  of  the  five  radiologists  varied  over  a 
wide  range,  from  one  to  25  cases,  this 
result  indicates  a  strong  possibility  that 
CAD  can  be  used  to  reduce  the  number  of 
unnecessary  biopsies. 

The  large  variation  in  improvement 
among  the  radiologists  may  have  been 
due  to  the  different  degrees  of  confidence 
that  they  had  in  the  computer  aid.  As 
with  any  new  diagnostic  tool,  this  confi¬ 
dence  is  influenced  by  the  experience  the 
radiologist  has  with  the  tool.  Although 
the  radiologists  received  training  before 
the  reading  sessions,  the  high  variability 
in  confidence  was  not  unexpected,  be¬ 
cause  this  ROC  study  was  the  first  in¬ 
stance  in  which  they  had  worked  with 
the  computer  aid.  Their  confidence  levels 
may  have  also  been  reflected  in  the  rela¬ 
tively  low  accuracy  of  classification  by 
some  radiologists  with  CAD  compared 
with  that  of  the  computer  classifier  alone. 

If  a  radiologist  can  increase  his  or  her 
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confidence  in  the  performance  of  a  com¬ 
puter  aid  by  gaining  more  extensive  clini¬ 
cal  experience,  then  he  or  she  will  likely 
be  able  to  find  the  most  effective  way  of 
merging  his  or  her  judgment  with  the 
computer's  rating  and  thus  reduce  both 
interobserver  and  intraobserver  variabil¬ 
ity.  Because  a  radiologist  who  uses  CAD 
can  establish  a  meaningful  decision 
threshold  for  biopsy  only  after  becoming 
familiar  with  the  sensitivity  and  specific¬ 
ity  of  working  with  CAD,  the  radiologists 
in  this  study  were  not  asked  to  decide 
whether  biopsy  should  have  been  per¬ 
formed  on  a  mass.  Rather,  we  focused  on 
the  evaluation  of  changes  in  the  sensitiv¬ 
ity  and  specificity  of  the  radiologists' 
classification  of  masses  when  CAD  was 
used. 

In  this  ROC  study,  all  six  observers 
were  attending  radiologists  with  exten¬ 
sive  experience  in  the  interpretation  of 
mammograms.  It  is  possible  that  the  com¬ 
puter  aid  may  be  even  more  useful  to 
radiology  residents  or  radiologists  with 
less  experience  in  mammography.  The 
effect  of  CAD  on  mammographic  interpre¬ 
tation  by  less-experienced  readers  will  be 
a  subject  of  investigation  in  future  stud¬ 
ies. 

The  observers  were  allowed  unlimited 
time  to  read  each  case  in  this  ROC  study. 
To  obtain  an  estimate  of  the  change  in 
reading  time  with  CAD,  we  recorded  the 
reading  time  of  each  observer  in  each 
reading  session  by  using  a  stopwatch.  For 
the  single-view  reading  experiment,  the 
average  reading  time  per  image  without 
CAD  varied  from  4.3  seconds  to  17.1 
seconds  (mean  time  for  the  six  observers, 
7.8  seconds).  The  average  reading  time 
per  image  with  CAD  varied  from  4.2 
seconds  to  17.3  seconds  (mean  time,  7.3 
seconds).  For  the  two-view  reading  experi¬ 
ment,  the  average  reading  time  per  pair  of 
images  without  CAD  varied  from  6.6  sec¬ 
onds  to  16.0  seconds  (mean  time,  10,4 
seconds).  The  average  reading  time  per 
pair  of  images  with  CAD  varied  from  7.6 
seconds  to  27.1  seconds  (mean  time,  13.5 
seconds). 

The  reading  time  essentially  did  not 
change  with  use  of  the  computer  aid  for 
the  single-view  readings.  For  the  two- 
view  readings,  the  radiologists  took  longer 
with  CAD,  probably  because  they  had  to 
merge  the  two  computer  ratings  and 
merge  the  computer  ratings  with  their 
own  evaluations.  Further  investigation  is 
needed  to  determine  whether  there  is  a 
trade-off  between  the  radiologist's  effi¬ 
ciency  and  the  method  of  presenting  the 
computer  rating  and  whether  the  reading 
time  with  CAD  will  depend  on  the  experi¬ 


ence  that  the  radiologist  has  with  the 
computer  information. 

In  the  observer  study,  we  used  laser- 
printed  mammograms  instead  of  the  origi¬ 
nal  mammograms  for  the  reading  experi¬ 
ments.  A  major  reason  is  that  it  is  difficult 
to  keep  all  the  original  mammograms 
together  for  the  entire  period  of  the  study 
because  they  are  part  of  active  patient 
files  and  thus  often  recalled  for  compari¬ 
son  with  new  studies  or  for  other  clinical 
reasons.  Because  the  maximum  optical 
density  of  laser-printed  images  was  3.1 
for  the  laser  imager  used,  the  contrast  on 
the  printed  mammograms  was  about  20% 
lower  than  that  on  the  original  mammo¬ 
grams.  Although  the  image  quality  was 
slightly  lower  than  that  of  the  original, 
the  laser-printed  digitized  images  were 
judged  to  be  adequate  for  reading  the 
details  of  the  masses  by  the  participating 
radiologists.  The  laser-printed  image  set 
might  also  be  considered  as  one  that  had 
slightly  more  subtle  masses  than  the  origi¬ 
nal  set  of  images.  Because  the  relative 
performance  of  two  modalities  is  mea¬ 
sured  in  ROC  experiments,  and  because 
the  readings  both  with  and  without  CAD 
in  this  study  were  conducted  with  the 
same  set  of  printed  images,  the  relative 
performance  of  the  two  readings  should 
be  valid.  It  should  also  be  noted  that  in 
order  for  a  computer  aid  that  uses  auto¬ 
mated  image  analysis  to  be  widely  ac¬ 
cepted,  direct  digital  mammography 
would  have  to  be  the  imaging  modality 
in  clinical  use.  Laser-printed  images  or 
soft-copy  monitors  will  be  the  display 
medium  for  the  digital  mammograms. 
The  use  of  laser-printed  images  for  this 
ROC  study  was  therefore  practical. 

In  our  observer  performance  experi¬ 
ment,  we  found  that  CAD  improved  the 
radiologists'  ability  to  distinguish  malig¬ 
nant  and  benign  masses.  This  is  consis¬ 
tent  with  the  results  of  other  studies 
(11,14)  in  which  a  statistically  significant 
improvement  (P  <  .001  in  both  studies) 
in  the  radiologists'  classification  accuracy 
by  using  CAD  was  found.  The  results  of 
the  former  study  (11)  further  showed  that 
the  PPV  of  a  recommendation  for  biopsy 
by  the  radiologists  was  significantly  in¬ 
creased  (P  <  .001).  In  our  approach,  the 
computer  classifier  automatically  ex¬ 
tracted  image  features,  whereas  in  the 
other  studies,  the  computer  classifier  used 
the  radiologist's  evaluation  and  other  pa¬ 
tient  information  as  input.  Therefore,  it 
appears  that  CAD  can  provide  a  useful 
second  opinion  to  radiologists,  either  by 
consistently  extracting  and  analyzing  the 
image  features  or  by  optimally  weighting 
various  diagnostic  factors  and  thereby 


improving  the  consistency  in  the  deci¬ 
sion-making  process.  This  suggests  that  a 
computer  classifier  that  combines  both 
approaches — that  is,  automatically  ex¬ 
tracts  image  features  and  optimally 
merges  them  with  the  radiologist's  evalu¬ 
ation  and  patient  information — may  be 
even  more  effective  for  breast  cancer  diag¬ 
nosis.  The  latter  step  will  also  improve 
the  radiologist's  utilization  of  the  com¬ 
puter  rating  on  the  basis  of  the  computer- 
extracted  features;  this  utilization  was 
found  to  have  large  interobserver  varia¬ 
tion  in  our  ROC  experiment. 

In  conclusion,  an  ROC  study  of  the 
effects  of  CAD  on  radiologists'  classifica¬ 
tion  of  malignant  and  benign  masses  on 
mammograms  was  conducted.  The  re¬ 
sults  showed  that  CAD  can  provide  a 
statistically  significant  improvement  in 
the  classification  accuracy — that  is,  in  the 
value — for  both  single-view  reading 
(P  =  .022)  and  two-view  reading  (P  = 
.007).  The  improved  separation  between 
the  confidence  ratings  of  the  malignant 
masses  and  those  of  the  benign  masses 
indicates  the  potential  that  CAD  may 
reduce  the  rate  of  biopsy  of  benign  masses 
when  decision  thresholds  are  properly 
chosen  by  the  radiologists.  The  decision 
threshold  may  vary  among  radiologists, 
as  in  the  case  of  mammographic  interpre¬ 
tation  without  CAD,  and  can  be  set  after 
the  radiologist  working  with  CAD  has 
established  his  or  her  sensitivity  and  speci¬ 
ficity  with  this  approach  through  clinical 
experience. 

Further  studies  are  needed  to  evaluate 
the  effects  of  CAD  on  the  accuracy  of 
radiologist  classification  of  masses  in  clini¬ 
cal  settings  in  which  the  prevalence  of 
malignant  masses  is  different  from  that  in 
a  laboratory  data  set  and  the  likelihood  of 
malignancy  of  a  mass  can  be  estimated  by 
the  computer  classifier.  In  the  two-view 
reading  ROC  experiment,  the  reading  time 
per  case  increased  by  about  30%  with  the 
use  of  CAD.  The  dependence  of  the  radi¬ 
ologist's  efficiency  in  reading  with  CAD 
on  the  presentation  method  and  on  the 
reader's  experience  in  using  the  computer 
information  also  warrants  further  investi¬ 
gation. 
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Abstract — A  new  type  of  classifier  combining  an  unsupervised 
and  a  supervised  model  was  designed  and  applied  to  classifi¬ 
cation  of  malignant  and  benign  masses  on  mammograms.  The 
unsupervised  model  was  based  on  an  adaptive  resonance  theory 
(ART2)  network  which  clustered  the  masses  into  a  number  of 
separate  classes.  The  classes  were  divided  into  two  types:  one 
containing  only  malignant  masses  and  the  other  containing  a  mix 
of  malignant  and  benign  masses.  The  masses  from  the  malignant 
classes  were  classified  by  ART2.  The  masses  from  the  mixed 
classes  were  input  to  a  supervised  linear  discriminant  classifier 
(LDA).  In  this  way,  some  malignant  masses  were  separated 
and  classified  by  ART2  and  the  less  distinguishable  benign  and 
malignant  masses  were  classified  by  LDA.  For  the  evaluation  of 
classifier  performance,  348  regions  of  interest  (ROPs)  containing 
biopsy  proven  masses  (169  benign  and  179  malignant)  were  used. 
Ten  different  partitions  of  training  and  test  groups  were  randomly 
generated  using  an  average  of  73%  of  ROPs  for  training  and 
27  %  for  testing.  Classifier  design,  including  feature  selection  and 
weight  optimization,  was  performed  with  the  training  group. 
The  test  group  was  kept  independent  of  the  training  group.  The 
performance  of  the  hybrid  classifier  was  compared  to  that  of 
an  LDA  classifier  alone  and  a  backpropagation  neural  network 
(BPN).  Receiver  operating  characteristics  (ROC)  analysis  was 
used  to  evaluate  the  accuracy  of  the  classifiers.  The  average  area 
under  the  ROC  curve  {Az)  for  the  hybrid  classifier  was  0.81  as 
compared  to  0.78  for  the  LDA  and  0.80  for  the  BPN.  The  partial 
areas  above  a  true  positive  fraction  of  0.9  were  0.34,  0.27  and 
0.31  for  the  hybrid,  the  LDA  and  the  BPN  classifier,  respectively. 
These  results  indicate  that  the  hybrid  classifier  is  a  promising 
approach  for  improving  the  accuracy  of  classification  in  CAD 
applications. 

Index  Terms —  Computer-aided  diagnosis,  hybrid  classifier, 
mammography,  neural  networks. 

1.  Introduction 

Mammography  is  the  most  effective  method  for 
detection  of  early  breast  cancer  [1],  However,  the 
specificity  for  classification  of  malignant  and  benign  lesions 
from  mammographic  images  is  relatively  low.  Clinical  studies 
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have  shown  that  the  positive  predictive  value  (i.e,,  ratio  of  the 
number  of  breast  cancers  found  to  the  total  number  of  biopsies) 
is  only  15%  to  30%  [2]-[4].  It  is  important  to  increase  the 
positive  predictive  value  without  reducing  the  sensitivity  of 
breast  cancer  detection.  Computer-aided  diagnosis  (CAD)  has 
the  potential  to  increase  the  diagnostic  accuracy  by  reducing 
the  false-negative  rate  while  increasing  the  positive  predictive 
values  of  mammographic  abnormalities. 

Classifier  design  is  an  important  step  in  the  development 
of  a  CAD  system.  A  classifier  has  to  be  able  to  merge 
the  available  input  feature  information  and  make  a  correct 
evaluation.  Commonly  used  classifiers  for  CAD  include  linear 
discriminants  (LDA)  [5],  [6]  and  backpropagation  neural  net¬ 
works  (BPN)  [7]-[9]  which  have  been  shown  to  perform  well 
in  lesion  classification  problems  [10]-[22].  These  classifiers 
are  generally  designed  by  supervised  training.  However,  these 
types  of  classifiers  have  limitations  dealing  with  the  nonlin¬ 
earities  in  the  data  (in  case  of  LDA)  and  in  generalizability 
when  a  limited  number  of  training  samples  are  available 
(especially  BPN).  Another  classification  approach  is  based  on 
unsupervised  classifiers,  which  cluster  the  data  into  different 
classes  based  on  the  similarities  in  the  properties  of  the  input 
feature  vectors.  Therefore,  unsupervised  classifiers  can  be  used 
to  analyze  the  similarities  within  the  data.  However,  it  is 
difficult  to  use  them  as  a  discriminatory  classifier  [29],  [30]. 
They  also  have  limited  generalizability  when  the  training 
sample  set  is  small. 

We  propose  here  a  hybrid  unsupervised/supervised  struc¬ 
ture  to  improve  classification  performance.  The  design  of 
this  structure  was  inspired  by  neural  information  processing 
principles  such  as  self  organization,  decentralization  and  gen¬ 
eralization.  It  combines  the  adaptive  resonance  theory  network 
(ART2)  [26],  [27]  and  the  LDA  classifier  as  a  cascade  system 
(ART2LDA).  The  self-organizing  unsupervised  ART2  network 
automatically  decomposes  the  input  samples  into  classes  with 
different  properties.  The  ART2  network  has  been  found  to 
perform  better  compared  to  conventional  clustering  techniques 
in  terms  of  learning  speed  and  discriminatory  resolution  for  the 
detection  of  rare  events  in  many  classification  tasks  [28]-[30]. 
The  supervised  LDA  then  classifies  the  samples  belonging  to 
a  subset  of  classes  that  have  greater  similarities.  By  improving 
the  homogeneity  of  the  samples,  the  classifier  designed  for  the 
subset  of  classes  may  be  more  robust. 

The  ART2LDA  design  implements  both  structural  and  data 
decomposition.  Decomposition  is  a  powerful  approach  that  can 
reduce  the  complexity  of  a  problem.  Both  structural  decom- 
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position  and  data  decomposition  can  improve  classification 
accuracy  [23]  as  well  as  model  accuracy  [24].  However, 
decomposition  can  also  reduce  the  prediction  accuracy  due  to 
overfitting  the  training  data.  We  will  demonstrate  in  this  paper 
that  the  proposed  hybrid  structure  can  reduce  the  overfitting 
problem  and  improve  the  prediction  capabilities  of  the  system. 
The  performance  of  the  hybrid  ART2LDA  classifier  will  be 
compared  with  those  of  an  LDA  alone  or  a  BPN  classifier. 

The  rest  of  the  paper  is  organized  as  follows.  In  Section  n 
the  ART2  unsupervised  network  is  described.  A  hybrid 
ART2LDA  classifier  is  introduced  in  Section  HI.  Section  IV 
describes  the  data  set  used  in  this  study.  The  results  are 
presented  in  Section  V.  Section  VI  contains  discussion  of 
these  results.  Finally,  Section  VII  concludes  this  investigation. 

n.  ART2  Unsupervised  Neural  Network 

The  ART2  is  a  self-organizing  system  that  can  simulate 
human  pattern  recognition.  ART2  was  first  described  by  Gross- 
berg  [25]  and  a  series  of  further  improvements  were  carried 
out  by  Carpenter,  Grossberg,  and  coworkers  [26]-[28].  The 
ART2  network  clusters  the  data  into  different  classes  based  on 
the  properties  of  the  input  feature  vectors.  The  members  within 
a  class  have  similar  properties.  The  process  of  ART2  network 
learning  is  a  balance  between  the  plasticity  and  stability 
dilemma.  Plasticity  is  the  ability  of  the  system  to  discover 
and  remember  important  new  feature  patterns.  Stability  is 
the  ability  of  the  system  to  remain  unchanged  when  already 
known  feature  patterns  with  noise  are  input  to  the  system.  The 
balance  between  plasticity  and  stability  for  the  ART2  training 
algorithm  allows  fast  learning  [28],  i.e.,  rare  events  can  be 
memorized  with  a  small  number  of  training  iterations  without 
forgetting  previous  events.  The  more  conventional  training 
algorithms,  such  as  back  propagation  [7]-[9],  perform  slow 
learning,  i.e.,  they  tend  to  average  over  occurrences  of  similar 
events  and  require  many  training  iterations. 

The  structure  of  the  ART2  system  is  shown  in  Fig.  1.  It 
consists  of  two  parts:  the  ART2  network  and  the  learning  stage. 
Suppose  that  there  are  n  input  features  Xi  {i  —  1,^  —  ,n)  and  k 
classes  in  the  ART2  network.  When  a  new  vector  is  presented 
to  the  input  of  the  ART2  network,  an  activation  value  pj  for 
class  j  is  calculated  as 

n 

Pj  =  XiWij,  j  =  (1) 

i— 1 

where  Wij  is  the  connection  weight  between  input  i  and  class 
j.  The  activation  value  is  a  measure  of  the  membership  of  the 
particular  input  feature  vector  to  class  j.  The  higher  the  value 
Pj  is,  the  better  the  input  vector  matches  class  j.  The  maximum 
value  pr  is  selected  from  all  {j  =  1,  •  ♦  • ,  k)  to  find  the  best 
class  match.  Furthermore,  in  order  to  balance  the  contribution 
to  the  activation  value  from  all  feature  components,  the  input 
feature  values  applied  to  the  ART2  system  are  scaled  between 
zero  and  one  [30].  This  normalization  will  allow  detection  of 
similar  feature  patterns  even  when  the  magnitudes  of  the  input 
feature  components  are  very  different. 

The  learning  stage  of  the  ART2  system  can  influence  the 
weights  of  the  selected  class  or  the  complete  ART2  network 


Xi  X2  x^  X4  Xn  Features 


Structure  by  adding  a  new  class.  An  additional  parameter,  the 
vigilance,  is  used  to  determine  the  type  of  learning  [26].  The 
vigilance  parameter  pvig  is  a  threshold  value  that  is  compared 
to  the  maximum  activation  value  pr-  If  Pr  is  larger  than  pvig 
then  the  input  vector  is  considered  to  belong  to  class  r.  The 
adaptation  of  the  weights  connected  with  class  r  is  performed 
as  follows: 

-  uC),  for  i  =  1,  ■  •  • ,  n  (2) 

where  7?  is  a  learning  rate.  The  adaptation  of  the  class  r  weights 
(2),  aims  at  maximization  of  the  Pr  value  for  the  particular 
input  vector.  In  an  iterative  manner  the  weights  are  adjusted 
so  that  the  activation  values  produced  for  similar  input  vectors 
will  be  maximum  only  for  the  class  to  which  they  belong  and 
these  maximum  activation  values  will  be  higher  than  pvig- 

If  the  maximum  activation  value  Pr  is  smaller  than  Pvig,  it  is 
an  indication  that  a  novelty  has  appeared  and  a  new  class  will 
be  added  to  the  ART2  structure.  The  new  weights  connecting 
the  input  with  the  new  class  (A:  -h  1)  are  initialized  with  the 
scaled  input  feature  values  of  this  novelty.  In  such  a  way,  the 
activation  value  pk^i  will  be  maximum  {pr  =  Pk-\-i)  higher 
than  pvig  when  computed  for  this  novelty  in  further  training 
iterations.  The  value  of  the  vigilance  parameter  pvig  determines 
the  resolution  of  ART2.  It  can  be  chosen  in  the  range  between 
zero  and  one.  In  the  case  that  Pvig  is  relatively  small,  only 
very  different  input  feature  vectors  will  be  distinguished  and 
separated  in  different  classes.  If  pvig  is  relatively  large,  the 
input  feature  vectors  that  are  more  similar  will  be  separated 
into  different  classes.  The  value  of  Pvig  is  selected  differently 
depending  on  the  particular  application. 

m.  ART2LDA  CLASSIFIER 

Despite  the  good  performance  of  ART2  for  efficient  clus¬ 
tering  and  detection  of  novelties,  the  fast  learning  approach 
can  cause  problems  associated  with  the  generalization  capa¬ 
bility  of  the  system  and  the  correct  classification  of  unknown 
cases.  Supervised  classifiers  such  as  linear  discriminants  or 
backpropagation  neural  network  classifiers  can  have  better 
generalization  capability  than  ART2,  because  they  are  trained 
by  averaging  over  similar  event  occurrences.  However,  the 
learning  process  in  these  traditional  learning  algorithms  tends 
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to  erase  the  memory  of  previous  expert  knowledge  when  a  new 
type  of  expertise  is  being  learned.  Therefore,  these  classifiers 
do  not  have  as  good  an  ability  to  correctly  classify  rare  events 
as  ART2  [28],  [29]. 

In  order  to  improve  the  accuracy  and  generalization  of  a 
classifier,  we  propose  to  design  a  hybrid  classifier  that  com¬ 
bines  the  unsupervised  ART2  network  and  a  supervised  LDA 
classifier.  This  hybrid  classifier  (ART2LDA)  utilizes  the  good 
resolution  capability  of  ART2  and  the  good  generalization 
capability  of  LDA.  The  ART2  first  analyzes  the  similarity  of 
the  sample  population  and  identifies  a  subpopulation  that  may 
be  separated  from  the  main  population.  This  will  improve  the 
performance  of  the  second-stage  LDA  if  the  subpopulation 
causes  the  sample  population  to  deviate  from  multivariate 
normal  distributions  for  which  LDA  is  an  optimal  classifier. 
Therefore,  the  ART2  serves  as  a  screening  tool  to  improve 
the  homogeneity  of  the  sample  distributions  by  classifying 
outlying  samples  into  separate  classes. 

The  ART2LDA  hybrid  classifier  can  be  described  as 

Val  =  9{h{^))h{x)  + 1  -  g{f2{x))  (3) 

where  x  is  the  input  vector,  /i(-)  is  the  LDA  classifier,  /2(-)  is 
the  ART2  classifier,  and  g{-)  is  a  binary  membership  function, 
which  labels  the  classes  identified  by  ART2  to  be  one  of  the 
two  types:  malignant  class  or  mixed  class.  A  particular  class 
is  defined  as  malignant  if  it  contains  only  malignant  members. 
It  is  defined  as  mixed  if  it  contains  both  malignant  and  benign 
members.  The  membership  function  is  defined  as  follows: 

0,  if  c  is  a  malignant  class 
1,  if  c  is  a  mixed  class.  ^ 

Tht  type  of  a  given  class  is  determined  based  on  ART2 
classification  of  the  training  data  set. 

The  structure  of  the  ART2LDA  classifier  is  shown  in  Fig.  2. 
The  ART2  classifies  the  input  sample  x  into  either  a  malignant 
or  a  mixed  class.  Depending  on  the  class  type  the  function 
p(-)  determines  whether  the  LDA  classifier  will  be  used. 
If  X  is  classified  into  a  mixed  class,  the  final  classification 
will  be  obtained  based  on  the  LDA  classifier.  However,  if 
X  is  classified  by  ART2  into  a  malignant  class,  then  the 
mass  will  be  considered  malignant,  without  using  the  LDA 
classifier.  Therefore,  in  the  ART2LDA  structure,  the  ART2 
is  used  both  as  a  classifier  and  a  supervisor.  This  can  be 
seen  in  (3).  The  first  term  in  (3),  is  the  LDA 

classifier  multiplied  by  the  ART2  control  part  g{f2{x)).  The 
second  term  in  (3),  (1  -  g{f2{x))),  gives  the  classification 
result  of  the  ART2  stage.  If  f2{x)  is  a  malignant  class,  then 
5(/2(^))  “  0,  the  LDA  stage  is  eliminated,  and  the  classifier 
output  Hal  is  equal  to  1.  On  the  other  hand,  if  f2{x)  is  a 
mixed  class,  then  g{f2{^))  =  1»  the  ART2  term  is  eliminated, 
and  the  final  classification  is  determined  by  the  LDA  classifier 
{Val  =  fiix)). 

IV.  Methods 

A.  Data  Set 

The  mammograms  used  in  this  study  were  randomly  se¬ 
lected  from  the  files  of  patients  who  had  undergone  biopsies 


X 
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Fig.  2.  Structure  of  the  ART2LDA  classifier. 

at  the  University  of  Michigan.  The  criterion  for  inclusion 
of  a  mammogram  in  the  data  set  was  that  the  mammogram 
contained  a  biopsy-proven  mass.  The  data  set  contained  348 
mammograms  with  a  mixture  of  benign  {n  =  169)  and 
.  malignant  (n  =  179)  masses.  On  each  mammogram,  a  region 
of  interest  (ROI)  containing  the  mass  was  identified  by  a 
radiologist  experienced  in  breast  imaging.  The  visibility  of 
the  masses  was  rated  by  the  radiologist  on  a  scale  of  1  to  10, 
where  the  rating  of  1  corresponds  to  the  most  visible  category. 
The  distributions  of  the  visibility  rating  for  both  the  malignant 
and  benign  masses  are  shown  in  Fig.  3.  The  visibility  ranged 
from  subtle  to  obvious  for  both  types  of  masses.  It  can  be 
observed  that  the  benign  masses  tend  to  be  more  obvious  than 
the  malignant  ones.  Additionally  the  likelihood  of  malignancy 
for  each  mass  was  estimated  based  on  its  mammographic 
appearance.  The  radiologist  rated  the  likelihood  of  malignancy 
on  a  scale  of  1  to  10,  where  1  indicated  a  mass  with  the  most 
benign  appearance.  The  distribution  of  the  malignancy  rating 
of  the  masses  is  shown  in  Fig.  4. 

The  data  set  can  be  considered  as  representative  of  the 
patient  population  that  is  sent  for  biopsy  under  current  clinical 
criteria.  Some  characteristics  of  many  malignant  and  benign 
masses  can  be  visually  distinguished  by  radiologists.  However, 
there  is  also  a  nonnegligible  fraction  of  malignant  masses  that 
are  very  similar  to  benign  masses  (the  low  malignancy  rating 
region  in  Fig.  4).  The  estimated  likelihood  of  malignancy  of 
malignant  and  benign  masses  that  are  sent  for  biopsy  basically 
overlaps  over  the  entire  range.  This  is  consistent  with  the  fact 
that  in  order  not  to  miss  malignant  masses  radiologists  must 
recommend  biopsy  for  even  very  low  suspicion  lesions. 

Three  hundred  and  five  of  the  mammograms  were  digitized 
with  a  LUMISYS  DIS-1000  laser  scanner  at  a  pixel  resolution 
of  100  /Jim  X  100  jiim  and  4096  gray  levels.  The  digitizer 
was  calibrated  so  that  gray  level  values  were  linearly  and 
inversely  proportional  to  the  optical  density  (OD)  within  the 
range  of  0.1  to  2.8  OD  units,  with  a  slope  of  -0.001  OD/pixel 
value.  Outside  this  range,  the  slope  of  the  calibration  curve 
decreased  gradually.  The  OD  range  of  the  digitizer  was  0 
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Visibility 


Fig.  3.  The  distribution  of  the  visibility  ranking  of  the  masses  in  the  dataset 
The  ranking  was  performed  by  an  experienced  breast  radiologist  (1:  very 
obvious,  10:  very  subtle). 


Malignancy  Ranking 


Fig.  4.  The  distribution  of  the  malignancy  ranking  of  the  masses  in  the 
dataset.  The  ranking  was  performed  by  an  experienced  breast  radiologist  (1: 
very  likely  benign,  10:  very  likely  malignant). 


to  3.5.  The  remaining  43  mammograms  were  digitized  with 
a  LUMISCAN  85  laser  scanner  at  a  pixel  resolution  of  50 
fim  X  50  /xm  and  4096  gray  levels.  The  digitizer  was 
calibrated  so  that  gray  level  values  were  linearly  and  inversely 
proportional  to  the  OD  within  the  range  of  0  to  4  OD  units, 
with  a  slope  of  —0.001  OD/pixel  value.  In  order  to  process  the 
mammograms  digitized  with  these  two  different  digitizers,  the 
images  digitized  with  LUMISCAN  85  digitizer  were  averaged 
with  a  2  X  2  box  filter  and  subsampled  by  a  factor  of  two, 
resulting  in  100  fim  images. 

In  order  to  validate  the  prediction  abilities  of  the  classifier, 
the  data  set  was  partitioned  randomly  into  training  and  test 
subsets  on  a  3:1  ratio,  under  the  constraints  that  both  the 
malignant  and  the  benign  samples  were  split  with  the  3:1  ratio 
and  that  the  images  from  the  same  patient  were  grouped  into 
the  same  (training  or  test)  subset.  These  constraints  caused 


the  subsets  to  deviate  from  an  exact  3:1  ratio.  The  data  set 
was  repartitioned  randomly  ten  times.  On  average,  73%  of  the 
samples  were  grouped  into  the  training  set  and  27%  into  the 
test  set.  The  training  and  test  results  from  the  ten  partitions 
were  averaged  to  reduce  their  variability. 

B,  Feature  Extraction 

A  rectangular  ROI  was  defined  to  include  the  radiologist- 
identified  mass  with  an  additional  surrounding  breast  tissue 
region  of  at  least  40  pixels  wide  from  any  point  of  the  mass 
border.  A  fully  automated  method  was  then  used  for  segmen¬ 
tation  of  the  mass  from  the  breast  tissue  background  within 
the  ROI.  The  rubber  band  straightening  transform  (RBST)  was 
previously  developed  [12]  to  map  a  band  of  pixels  surrounding 
the  mass  onto  the  Cartesian  plane  (a  rectangular  region).  In  the 
transformed  image,  the  border  of  mass  appears  approximately 
as  a  horizontal  edge  and  spiculations  appear  approximately 
as  vertical  lines.  The  transformation  of  the  radially  oriented 
textures  surrounding  the  mass  margin  to  a  more  uniform 
orientation  facilitates  the  extraction  of  texture  features. 

The  texture  features  used  in  this  study  were  calculated  from 
spatial  gray-level  dependence  (SOLD)  matrices  [10]-[12], 
[31],  and  run-length  statistics  (RLS)  matrices  [32]  computed 
from  the  RBST  images.  The  (x,i)th  element  of  the  SGLD 
matrix  is  the  joint  probability  that  gray  levels  i  and  j  occur  in 
a  direction  at  a  distance  of  0  pixels  apart  in  an  image.  Based 
on  our  previous  studies  [10],  a  bit  depth  of  eight  was  used  in 
the  SGLD  matrix  construction,  i.e.,  ^e  four  least  significant 
bits  of  the  12-bit  pixel  values  were  discarded.  Thirteen  texture 
measures,  including  correlation,  energy,  difference  entropy,  in¬ 
verse  difference  moment,  entropy,  sum  average,  sum  entropy, 
inertia,  sum  variance,  difference  average,  difference  variance, 
and  two  types  of  information  measure  of  correlation  were  used. 
These  measures  were  extracted  from  each  SGLD  matrix  at 
ten  different  pixel  pair  distances  {d  =  1, 2, 3, 4, 6, 8, 10, 12, 16 
and  20)  and  in  four  directions  (0°,  45°,  90°,  and  135°). 
Therefore,  a  total  of  520  SGLD  features  were  calculated 
for  each  image.  The  definitions  of  the  texture  measures  are 
given  in  the  literature  [10]-[12],  [31].  These  features  contain 
information  about  image  characteristics  such  as  homogeneity, 
contrast,  and  the  complexity  of  the  image. 

RLS  texture  features  were  extracted  from  the  vertical  and 
horizontal  gradient  magnitude  images,  which  were  obtained 
by  filtering  the  RBST  image  with  horizontally  or  vertically 
oriented  Sobel  filters  and  computing  the  absolute  gradient 
value  of  the  filtered  image.  A  gray  level  run  is  a  set  of 
consecutive,  collinear  pixels  in  a  given  direction  which  have 
the  same  gray  level  value.  The  run  length  is  the  number  of 
pixels  in  a  run  [32].  The  RLS  matrix  describes  the  run  length 
statistics  for  each  gray  level  in  the  image.  The  (i,i)th  element 
of  the  RLS  matrix  is  the  number  of  times  that  the  gray  level  i 
in  the  image  possesses  a  run  length  of  j  in  a  given  direction. 
In  our  previous  study,  it  was  found  experimentally  that  a  bit 
depth  of  five  in  the  RLS  matrix  computation  could  provide 
good  texture  characteristics  [12]. 

Five  texture  measures,  namely,  short  run  emphasis,  long  run 
emphasis,  gray  level  nonuniformity,  run  length  nonuniformity. 
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and  run  percentage  were  extracted  from  the  vertical  and 
horizontal  gradient  images  in  two  directions,  ^  —  0°  and  6  = 
90°.  Therefore,  a  total  of  20  RLS  features  were  calculated  for 
each  ROI.  The  formal  definition  of  the  RLS  feature  measures 
can  be  found  in  [32]. 

A  total  of  540  features  (520  SGLD  and  20  RLS)  were 
therefore  extracted  from  each  ROI. 

C.  Feature  Selection 

In  order  to  reduce  the  number  of  the  features  and  to  obtain 
the  best  feature  set  to  design  a  good  classifier,  feature  selection 
with  stepwise  linear  discriminant  analysis  [33]  was  applied. 
At  each  step  of  the  stepwise  selection  procedure  one  feature 
is  entered  or  removed  from  the  feature  pool  by  analyzing 
its  effect  on  the  selection  criterion.  In  this  study,  the  Wilks' 
lambda  (the  ratio  of  within-group  sum  of  squares  to  the  total 
sum  of  squares  [34])  was  used  as  a  selection  criterion.  The 
optimization  procedure  used  a  threshold  Fin  for  feature  entry 
and  a  threshold  Font  for  feature  removal.  On  a  feature  entry 
step,  the  features  not  yet  selected  are  entered  into  the  selected 
feature  pool  one  at  a  time,  the  significance  of  the  change  in  the 
Wilks’  lambda  caused  by  this  feature  is  estimated  based  on  F 
statistics.  The  feature  with  the  highest  significance  is  entered 
into  the  feature  pool  if  its  significance  is  higher  than  Fin-  On 
a  feature  removal  step,  the  features  which  have  already  been 
selected  are  analyzed  one  at  a  time  from  the  selected  feature 
pool  and  the  significance  of  the  change  in  the  Wilks’  lambda 
is  estimated.  The  feature  with  the  least  significance  is  removed 
from  the  selected  feature  pool  if  the  significance  is  less  than 
Fout-  Since  the  appropriate  values  of  Fin  and  Font  are  not 
known  a  priori,  we  examined  a  range  of  Fin  and  Font  values 
and  chose  the  appropriate  thresholds  in  such  a  way  that  a 
minimum  number  of  features  were  selected  to  achieve  a  high 
accuracy  of  classification  by  LDA  for  the  training  sets.  More 
details  about  the  stepwise  linear  discriminant  analysis  and  its 
application  to  CAD  can  be  found  in  [10]-[12]. 

D.  Performance  Analysis 

To  evaluate  the  classifier  performance,  the  training  and 
test  discriminant  scores  were  analyzed  using  receiver  operat¬ 
ing  characteristic  (ROC)  methodology  [35].  The  discriminant 
scores  of  the  malignant  and  benign  masses  were  used  as 
decision  variables  in  the  LABROCl  program  [36],  which 
fit  a  binormal  ROC  curve  based  on  maximum  likelihood 
estimation.  The  classification  accuracy  was  evaluated  as  the 
area  under  the  ROC  curve,  Az-  For  the  ART2LDA  classifier, 
the  discriminant  scores  of  all  case  samples  classified  in  the  two 
stages  are  combined.  All  masses  classified  into  the  malignant 
group  by  the  ART2  stage  were  assigned  a  constant  positive 
discriminant  score  higher  than  or  equal  to  the  most  malignant 
discriminant  score  obtained  from  the  LDA  stage  . 

The  performance  of  ART2LDA  was  also  assessed  by  esti¬ 
mation  of  the  partial  area  index  (Ai^’^^)  and  compared  with 
the  corresponding  performance  index  of  the  LDA  and  BPN 
classifiers.  The  partial  area  index  (A^^*^^)  is  defined  as  the  area 
that  lies  under  the  ROC  curve  but  above  a  sensitivity  threshold 
of  0.9  (TPFo  —  0.9)  normalized  to  the  total  area  above  TPFq, 


TABLE  I 

Number  of  Selected  Features  for  the  Ten  Data  Groups 
WITH  the  Corresponding  Fi^  and  Fqut  Parameters 


Data  Group 
No. 

Number  of 
selected 
features 

Fin 

Foot 

1 

12 

1.8 

1.6 

2 

15  ■ 

2.4 

2.2 

3 

13 

2.4 

2.2 

4 

18 

2.4 

2,2 

5 

14 

2.4 

2.2 

6 

14 

2.1 

1.8 

7 

13 

2.4 

2.2 

8 

18 

1.8 

1.6 

9 

14 

''  2.4 

2.2 

10 

14 

2.4 

2.2 

(1-TPFo).  The  partial  indicates  the  performance  of  the 
classifier  in  the  high-sensitivity  (low  false  negative)  region 
which  is  most  important  for  clinical  cancer  detection  task.  In 
addition,  the  performance  of  the  LDA  stage  of  the  ART2LDA 
classifier  was  evaluated  by  the  estimation  of  the  area  under 
the  ROC  curve,  denoted  as  Az  (LDA),  for  the  case  samples 
passed  onto  the  LDA  classifier. 

V.  Results 

In  this  section  the  ART2LDA  classification  results  for 
malignant  and  benign  masses  will  be  presented  and  compared 
with  those  of  the  LDA  or  BPN  classifiers.  The  important 
point  in  this  study  is  the  fact  that  the  test  subset  is  truly 
independent  of  the  training  subset.  Only  the  training  subset 
is  used  for  feature  selection  and  classifier  training,  and  only 
the  test  subset  is  used  for  classifier  validation.  In  order  to 
validate  the  prediction  abilities  of  the  classifier,  ten  different 
partitions  of  the  training  and  test  sets  were  used,  A  different 
ART2LDA  classifier  was  trained  using  each  training  set  and 
the  corresponding  set  of  selected  features.  The  classification 
result  was  estimated  as  the  average  performance  for  the  ten 
partitions. 

For  a  given  partition  of  training  and  test  sets,  feature 
selection  was  performed  based  on  the  training  set  alone.  The 
feature  selection  results  for  the  ten  different  training  groups  are 
shown  in  Table  I.  The  average  number  of  selected  features  was 
14.  An  average  of  two  RLS  features  and  twelve  SGLD  features 
were  selected  for  each  of  the  training  sets  which  represented 
10%  of  all  RLS  features  and  2.3%  of  all  SGLD  features, 
respectively.  Both  types  of  features  (RLS  and  SGLD)  are 
necessary  in  order  to  obtain  good  classification.  The  most  often 
selected  RLS  features  for  the  ten  training  sets  were:  horizontal 
short  run  emphasis  (four  times),  horizontal  long  run  emphasis 
(six  times),  vertical  run  length  nonuniformity  (three  times), 
horizontal  run  length  nonuniformity  (three  times).  The  most 
often  selected  SGLD  texture  measures  for  the  ten  training  sets 
were:  inverse  difference  moment  (eight  times),  information 
measure  of  correlations  one  and  two  (19  times),  difference 
average  (nine  times),  and  correlation  (ten  times).  For  a  given 
texture  measure,  features  at  different  angles  or  distances  may 
be  selected,  but  these  features  are  usually  highly  correlated  so 
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Fig.  5,  ART2LDA  and  LDA  classification  results  for  training  and  test  sets 
from  data  group  three  as  a  function  of  the  generated  number  of  classes. 
Additionally  the  results  for  the  LDA  stage  from  the  ART2LDA  classifier 
are  plotted. 

that  they  can  be  considered  to  be  similar  and  counted  together 
as  described  above. 

A.  ART2LDA  Classification  Results 

For  the  ART2LD  A  classifier,  the  number  of  selected  features 
determines  the  dimensionality  of  the  input  vector  of  the  ART2 
classifier  and  the  dimensionality  of  the  LDA  classifier.  By 
applying  different  values  for  the  vigilance  parameter,  ART2 
classifiers  with  different  number  of  classes  were  obtained.  In 
this  study,  the  vigilance  parameter  pvig  was  varied  from  0.9 
to  0,99,  resulting  in  a  range  of  10  to  240  classes.  The  overall 
performance  of  the  ART2LDA  classifier  was  evaluated  for 
different  numbers  of  ART2  classes  because  different  subset 
of  the  samples  were  separated  and  classified  by  ART2  when 
Pvig  was  varied.  In  Fig.  5,  the  classification  results  for  the 
ART2LDA  are  compared  to  the  results  from  LDA  alone  for 
the  training  and  test  set  partition  three.  The  classification 
accuracy,  was  plotted  as  a  function  of  the  number  of 
ART2  classes.  For  this  training  and  test  set  partition,  when 
the  number  of  classes  was  between  20  and  60,  the  ART2LDA 
classifier  improved  the  classification  accuracy  for  the  test  set 
in  comparison  to  LDA.  As  the  number  of  classes  increased  to 
greater  than  60,  the  Az  value  increased  for  the  training  data 
set,  but  decreased  for  the  test  data  set  and  was  lower  than  that 
of  the  LDA  alone.  The  two  solid  lines  in  Fig.  5  show  the  Az 
values  for  the  LDA  stage  in  the  ART2LDA  classifier  for  both 
the  training  and  test  sets.  It  can  be  observed  that  the  test  Az 
for  the  LDA  stage  is  higher  than  the  Az  for  the  LDA  classifier 
alone,  but  not  as  high  as  Az  obtained  by  ART2LDA  when  the 
number  of  classes  is  small. 

In  Fig.  6  the  classification  results  of  LDA  and  ART2LDA 
for  the  partition  one  training  and  test  sets  are  shown.  In  this 
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Fig.  6.  ART2LDA  and  LDA  classification  results  for  training  and  test  sets 
from  data  group  one  as  a  function  of  the  generated  number  of  classes. 
Additionally  the  results  for  the  LDA  stage  from  the  ART2LDA  classifier 
are  plotted. 

case  it  appeared  that  in  the  test  set  there  were  two  large 
malignant  outliers  which  degraded  the  LDA  performance. 
Only  15  classes  at  the  ART2  stage  in  the  ART2LDA  was 
enough  to  cluster  the  outliers  into  a  separate  malignant  class 
and  to  improve  the  performance  of  the  LDA  stage  and  the 
overall  result.  The  rest  of  the  outliers  required  more  ART2 
classes  before  they  were  clustered  into  separate  classes  and 
correctly  classified  as  malignant.  This  is  the  reason  for  the 
similar  behavior  of  the  classifiers  for  partitions  three  and  one 
in  the  range  of  40  to  70  classes  as  seen  in  Figs.  5  and  6. 
When  the  number  of  classes  was  less  than  70,  the  test  Az  for 
the  LDA  stage  (A;j:(LDA))  was  higher  than  the  LDA  alone,  but 
not  as  high  as  the  Az  for  ART2LDA  with  less  than  30  classes 
(Fig.  6).  The  best  Az  values  for  the  test  data  sets  of  the  ten 
training  and  test  partitions  are  presented  in  Table  II  and  Fig.  7. 
The  AK.T2LDA  classifier  achieved  higher  Az  values  than  the 
LDA  alone  in  nine  of  the  ten  partitions.  The  average  Az  is 
0.81  for  ART2LDA  and  0.78  for  LDA  alone.  The  standard 
deviations  of  the  Az  values  for  the  ten  groups  range  from 
0.03  to  0.05  for  the  ART2LDA  classifier  and  from  0.04  to 
0.05  for  the  LDA  classifier. 

The  performance  of  ART2LDA  was  also  assessed  by  esti¬ 
mation  of  the  partial  area  under  the  ROC  curve  A^'^^  at  a 
TPF  higher  than  0.9.  The  results  are  presented  in  Table  HI 
and  Fig.  7.  In  the  lower  part  of  Fig.  7,  the  A^‘^^  values  of  the 
test  set  for  the  corresponding  ten  partitions  of  training  and  test 
sets  are  presented.  The  average  test  A^z  ’^^  value  is  0.34  for  the 
ART2LDA  and  0.27  for  LDA.  For  nine  of  the  ten  partitions, 
the  A^'^^  value  was  improved  at  the  high-sensitivity  operating 
region  (TPF  >  0.9)  of  the  ROC  curve. 

The  classifier  performance  was  also  evaluated  when  the 
ART2LDA  classifiers  were  designed  using  a  fixed  number 
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TABLE  11 

Classifiers  Performance  for  the  Ten  Test  Sets.  The  Az 
Values  Represent  the  Total  Area  Under  ROC  Curve 


Data  Group 
No. 

LDA 

■“art^lda 

BPN 

ART2LDA(1) 

1 

2" 

0.77 

0.78 

0.83 

0.80 . 

0.8.5 

0.82 

_ 0.80 

0.77 

3 

0.74 

0.78 

0.77 

0.78 

4 

0.77 

0.77 

0.75 

0.77 

5 _ 

0.77 

0.78 

0.76 

0.77 

6 

0.80 

0.83 

0.82 

0.81 

7 

0.80 

0.81 

0.82 

0.77 

8 

0.77 

0.80 

0.74 

0.75 

9 

0.77 

0.80  i 

0.81 

o.’so 

_ iq _ 

0.86 _ 

0.89 

0.84 

0.89 

Mean 

O'.TS  1 

1  0.81 

0.80 

0.79 

LDA  (Az)  — ^  LDA  (Az‘°®^) 

ART2LDA  (Az)  ART2LDA  (Az^‘^'®>) 

— ART2LDA(1)  (Az^°®^) 


Data  Group  Number 


Fig.  7.  Average  Az  classification  results  for  the  10  test  sets.  The  top  graphs 
represent  the  ART2LDA  and  LDA  Az  values  for  the  total  area  under  the 
ROC  curve.  The  bottom  graphs  represent  the  ART2LDA,  ART2LDA(1)  and 
LDA  Az  values  for  the  partial  area  of  the  ROC  curve  above  the  true  positive 
fraction  of  0.9. 


TABLE  III 

Classifiers  Results  for  the  Ten  Test  Sets.  The  Az 
Values  Represent  the  Partial  Area  of  the  ROC  Curve 
Above  THE  True  Positive  Fraction  of  0.9 


Data  Group 
No. 

LDA 

ART2LDA 

BPN 

AR'1^LJ7A(1) 

1 

0.14 

0.23 

0.31 

0.26 

2 

0.17 

0.21 

0.28 

0.27 

3 

0.19 

0.32 

0.27 

0.32 

4 

0.19 

0.21 

0.19 

0.21 

5 

0.24  ! 

0.26 

0.32 

0.24 

6 

0.27'  i 

0.38 

0.27 

0.44 

7 

0.32  i 

0.31 

0.38 

0.30 

8 

0.32  : 

0.‘34  ’ 

0.25 

0.38 

9 

0.40 

0.49 

0.40 

0.49 

id  i 

0.44 

0.60 

0.38 

0.60 

Mean 

6.27 

0.34 

0.31 

0.35 

of  ART2  classes.  The  and  results,  averaged  over 

the  ten  test  partitions,  are  presented  in  Table  IV.  The  average 
Az  with  the  ART2LDA  classifier,  compared  to  that  of  LDA 
alone,  was  again  improved  between  15  and  40  classes.  The 
maximum  average  A^  of  0.80  was  achieved  between  20  and 
40  classes.  The  average  results  are  improved  for  all 


TABLE  IV 

Average  Az  and  Average  Classification  Results  for  the  Ten  Test 

Sets.  Classifiers  Were  Designed  Using  a  Fixed  Number  of  ART2  Classes 


LDA 

ART2LDA 

No.  of  classes 

15 

20 

30 

40  i 

50 

60 

A, 

0.78 

0.80 

0.80 

0.80 

0.80 

0.78 

0.77 

6.27 

0.30 

0.31 

0.33 

0.33 

0.31 

0.31 

ART2LDA  classifiers  presented  in  Table  IV.  The  maximum 
average  value  is  0.33  and  it  remains  constant  between 
30  and  40  classes. 

An  alternative  way  to  evaluate  the  performance  of  a  classi¬ 
fier  is  its  classification  accuracy  when  a  decision  threshold  for 
malignancy  is  selected  based  on  the  training  set.  For  instance, 
a  decision  threshold  may  be  selected  such  that  all  positive 
samples  from  the  training  set  are  classified  correctly  i.e.,  at  a 
sensitivity  of  100%.  The  ART2LDA  with  this  decision  thresh¬ 
old  is  referred  to  as  ART2LDA(1).  For  a  given  training  and 
test  partitioning,  ART2LDA  classifiers  with  different  number 
of  classes  in  the  ART2  stage  were  obtained  (Figs.  5  and  6).  For 
each  of  these  models  the  decision  threshold  for  a  sensitivity  of 
100%  was  selected  from  the  training  set  and  the  corresponding 
ART2LDA(1)  classifier  was  obtained.  Then  the  ART2LDA(1) 
classifier  (with  a  specific  number  of  classes  in  the  ART2  stage) 
that  correctly  classified  the  maximum  number  of  malignant 
masses  in  the  test  set  is  selected.  By  using  all  samples  of 
the  test  set,  the  A^  value  is  calculated  for  the  corresponding 
ART2LDA  model.  The  A^  values  for  the  ART2LDA(1)  classi¬ 
fiers  for  the  test  sets  of  the  ten  data  partitionings  are  shown  in 
Tables  II  and  III.  For  five  of  the  partitions  the  overall  A^  value 
for  ART2LDA(1)  is  higher  than  that  of  LDA  alone  (Table  II). 
The  average  A^  value  was  0.79.  The  partial  areas  above  the 
TP  fraction  of  0.9,  A^z'^\  for  the  ten  test  data  sets  obtained 
by  the  ART2LDA(1)  classifier  are  also  shown  in  Fig.  7.  The 
ART2LDA(1)  achieved  the  highest  average  value  of 

0.35  compared  to  ART2LDA  and  LDA  (Table  IB). 

B.  BPN  Classification  Results 

A  multilayer  perceptron  back-propagation  neural  network 
with  a  single  hidden  layer  and  a  single  output  node  was  used 
for  comparison  with  the  ART2LDA  classifier.  The  number 
of  selected  features  determined  the  number  of  input  nodes  to 
the  BPN.  The  same  ten  training/test  set  partitions  (as  in  the 
case  of  ART2LDA)  were  used  for  the  training  and  validation 
of  the  BPN  classifiers.  BPN’s  with  their  number  of  hidden 
nodes  ranging  from  two  to  ten  were  evaluated  to  obtain  the 
best  architecture.  Back-propagation  training  was  used.  Each 
of  the  BPN’s  was  trained  for  up  to  18000  training  epochs. 
At  every  1000  epochs  the  neural  network  weights  were  saved 
and  the  classification  result  for  the  corresponding  test  set  was 
evaluated.  This  design  procedure  was  repeated  for  each  of  the 
ten  training/test  groups.  For  each  group,  the  best  test  result 
among  all  the  BPN  architectures  (different  number  of  hidden 
nodes)  and  all  the  training  epochs  examined  was  selected. 
The  average  test  Az  over  the  ten  groups  for  the  BPN  was 
0.80,  compared  to  0.81  for  ART2LDA  (Table  H).  The  standard 
deviations  of  the  Az  values  for  the  ten  groups  range  from  0.04 
to  0.05  for  the  BPN.  The  average  partial  A^z  ’^^  for  the  BPN 
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was  0.31,  compared  to  0.34  for  ART2LDA  (Table  III).  The 
Az  and  of  the  ART2LDA  classifier  were  higher  than 

those  of  the  BPN  in  six  of  the  ten  training/test  groups. 

VI.  Discussion 

In  the  present  study,  a  new  classifier  (ART2LDA)  was 
designed  and  applied  to  the  classification  of  malignant  and 
benign  masses.  The  results  indicated  that  the  ART2LDA 
classifier  had  better  generalizability  than  an  LDA  classifier 
alone.  The  ART2  classifier  grouped  the  case  samples  that  were 
diiferent  from  the  main  population  into  separate  classes.  The 
minimum  number  of  classes  needed  to  start  the  clustering  of 
outliers  into  separate  classes  depended  on  how  different  the 
outliers  were  from  the  rest  of  the  sample  population.  For  the 
ten  different  partitions  of  training  and  test  sets  used  in  this 
study,  the  minimum  number  varied  between  13  and  15  classes. 
When  the  number  of  ART2  classes  was  less  than  this  minimum 
number  of  classes,  the  ART2  classifier  generated  only  mixed 
malignant-benign  classes  and  all  samples  were  transferred  to 
the  LDA  stage.  In  that  case,  the  ART2LDA  was  equivalent 
to  the  LDA  classifier  alone.  When  a  higher  number  of  classes 
were  generated,  an  increased  number  of  cases  that  might  be 
considered  outliers  of  the  general  data  population  was  removed 
(clustered  in  separate  classes).  For  the  ten  training  sets  used 
in  this  study,  the  malignant  outliers  were  gradually  removed 
when  the  number  of  classes  increased.  The  training  accuracy 
increased  when  the  number  of  classes  increased  and  Az  could 
reach  the  value  of  1.0.  However,  a  large  number  of  AKr2 
classes  led  to  overfitting  the  training  sample  set  and  poor 
generalization  in  the  test  set.  The  classification  accuracy  of 
ART2  for  the  test  set  tended  to  decrease  when  the  number  of 
classes  was  greater  than  about  70.  The  large  number  of  classes 
also  led  to  a  reduction  in  the  generalizability  of  the  second- 
stage  LDA;  the  training  of  LDA  with  a  small  number  of 
samples  would  again  result  in  overfitting  the  training  set,  and 
poor  generalizability  in  the  test  set.  This  effect  was  observed 
when  more  than  60  or  70  classes  were  generated  by  ART2 
(see  Figs.  5  and  6). 

The  classification  accuracy  of  ART2LDA  increased  initially 
with  an  increased  number  of  classes  and  then  decreased 
after  reaching  a  maximum.  The  correct  classification  of  the 
outliers  by  the  ART2  in  combination  with  an  improvement 
in  the  classification  by  the  LDA  resulted  in  the  increased 
accuracy.  When  the  number  of  ART2  classes  was  further 
increased,  the  effects  of  overfitting  by  the  ART2  and  the  LDA 
became  dominant  and  the  prediction  ability  of  the  ART2LDA 
decreased.  In  some  cases  the  second-stage  LDA  prediction 
was  much  worse  than  the  ART2.  In  other  cases  the  ART2 
could  not  generalize  well.  The  generation  of  a  high  number  of 
classes  is  therefore  impractical  and  unnecessary  both  from  a 
computational  and  a  methodological  point  of  view. 

For  the  optimal  number  of  classes  (usually  less  than  50  for 
the  data  sets  used)  the  Az  value  for  the  second-stage  LDA  in 
the  ART2LDA  was  better  than  an  LDA  classifier  alone,  but  it 
was  not  as  good  as  the  overall  Az  from  the  ART2LDA.  It  is 
evident  that  the  ART2  was  a  useful  classifier  for  improvement 
of  the  second-stage  classification. 
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When  the  partial  area  of  the  ROC  curve  above  the  true  posi¬ 
tive  fraction  (TPF)  of  0.9  (Ai^'^^)  was  considered  as  a  measure 
of  classification  accuracy,  the  advantage  of  ART2LDA  over 
LDA  alone  became  even  more  evident.  By  removing  and  cor¬ 
rectly  classifying  the  outliers,  the  accuracy  of  the  classification 
was  increased  at  the  high  sensitivity  end  of  the  curve. 

The  classifier  performance  was  evaluated  when  the 
ART2LDA  classifiers  were  designed  using  a  fixed  number 
of  ART2  classes.  The  results  showed  improved  performance 
of  the  ART2LDA  in  a  range  between  20  and  40  ART2 
classes.  Both  the  average  Az  and  the  average  ^z  '^^  reached 
a  maximum  within  this  region,  and  the  maximum  average  Az 
and  the  average  ^z'^^  values  remained  unchanged  between  30 
and  40  classes.  These  results  indicated  that  the  performance 
of  a  hybrid  ART2LDA  classifer  was  robust  and  stable  and 
could  be  potentially  useful  in  real  clinical  applications. 

We  have  performed  statistical  tests  with  the  CLABROC 
program  to  estimate  the  significance  in  the  differences  between 
the  Az  values  from  the  ART2LDA,  the  LDA  alone,  and  the 
BPN,  as  well  as  in  the  differences  in  the  partial  from  the 
three  classifiers.  The  statistical  tests  were  performed  for  each 
individual  data  set  partition  because  the  correlation  among  the 
data  sets  from  the  different  partitions  precludes  the  use  of 
student’s  paired  t  test  with  the  ten  partitions.  We  found  that  the 
differences  in  both  cases  did  not  reach  statistical  significance 
because  of  the  small  number  of  test  samples  and  thus  the  large 
standard  deviation  in  the  Az  values.  However,  the  consistent 
improvements  in  Az  and  A^'^^  by  the  ART2LDA  (9  out  of 
10  data  set  partitions  in  both  cases  for  LDA  and  six  out  of 
ten  data  set  partitions  in  both  cases  for  BPN)  suggest  that  the 
improvement  was  not  by  chance  alone,  and  that  the  accuracy 
of  a  classification  task  could  be  improved  by  the  use  of  an 
ART2  network.  In  addition,  one  advantage  of  the  ART2LDA 
is  that  the  training  process  is  more  efficient  than  that  of  the 
BPN,  especially  when  there  is  a  subset  of  outlying  samples.  In 
such  a  case,  the  BPN  will  require  a  large  number  of  training 
epochs  to  minimize  the  error  function. 

ART2LDA  can  be  trained  to  classify  the  sample  cases  into 
more  than  two  classes,  such  as  a  class  of  normal  tissue  regions 
in  addition  to  malignant  and  benign  masses.  There  will  be  an 
increase  in  the  complexity  of  training  and  a  larger  training 
sample  size  will  be  desired,  but  these  requirements  will  be 
comparable  for  the  different  classifiers.  In  a  clinical  situation, 
if  the  classification  task  is  performed  on  all  computer-detected 
lesions,  the  classifier  has  to  distinguish  the  falsely  detected 
normal  tissue  from  malignant  or  benign  lesions.  However, 
it  may  be  noted  that  a  classifier  that  can  distinguish  only 
malignant  and  benign  masses  is  applicable  to  the  scenario 
that  the  radiologist  identifies  a  suspicious  lesion  on  the  mam¬ 
mogram  and  would  like  to  have  a  second  opinion  about  its 
likelihood  of  malignancy  before  making  a  diagnostic  decision. 
Therefore,  the  development  of  a  classifier  that  can  differentiate 
malignant  and  benign  masses  is  the  research  of  interest  for 
many  investigators. 

Similarly,  ART2  can  be  trained  to  discover  and  remove  a 
pure  benign  mass  class.  The  approach  will  be  similar  to  the 
task  of  classifying  and  removing  the  pure  malignant  classes, 


1186 


IEEE  TRANSACTIONS  ON  MEDICAL  IMAGING,  VOL.  18,  NO.  12,  DECEMBER  1999 


as  described  in  this  study.  However,  our  approach  of  removing 
the  malignant  classes  will  reduce  the  chance  of  misclassifica- 
tion  of  malignant  masses.  In  breast  cancer  detection,  the  cost 
of  false-negative  (missed  cancer)  is  very  high.  Therefore,  our 
goal  in  classifier  design  is  to  be  conservative.  By  removing 
the  malignant  classes  in  the  first  stage,  any  misclassification 
to  these  classes  will  be  regarded  as  malignant.  The  remaining 
classes  will  be  classified  again  with  the  second-stage  classifier 
so  malignant  masses  will  be  less  likely  to  be  missed. 

The  problem  of  classification  of  malignant  and  benign 
masses  has  been  studied  by  many  investigators,  Rangayyan 
et  ah  [15]  used  Mahalanobis  distance  classifer  (a  modification 
of  an  LDA  classifier)  and  the  leave-one-out  method  to  evaluate 
the  classification  of  54  masses.  Fogel  et  al  [16]  compared 
LDA  and  BPN  classifiers  using  the  leave-one-out  method  and 
139  masses  (malignant  and  benign  classification).  Highnam 
et  al  [17]  used  a  morphological  feature  called  a  halo  to 
classify  40  masses  as  malignant  and  benign.  Huo  et  al  [22] 
employed  BPN  and  a  rule-based  classifier  to  classify  95  masses 
using  the  leave-one-out  evaluation  method.  Sahiner  et  al  [12] 
used  an  LDA  classifier  and  the  leave-one-out  method  to 
classify  168  masses.  An  important  difference  between  the 
classifier  designed  in  this  study  and  the  previous  studies  in 
the  CAD  field  is  the  method  of  feature  selection.  In  the 
above  mentioned  studies  [12],  [15]-[17],  [22]  and  several  other 
published  studies  [18]-[21]  the  features  were  selected  from  the 
entire  data  set  first,  and  then  the  data  set  was  partitioned  into 
training  and  test  sets.  This  meant  that  at  the  feature  selection 
stage  of  the  classifier  design,  the  entire  data  set  was  used  as  a 
training  set.  Depending  on  the  distribution  of  the  features  and 
the  total  number  of  samples  used,  the  test  results  in  these 
studies  might  be  optimistically  biased  [37].  In  our  current 
study,  the  entire  data  set  was  initially  partitioned  into  training 
and  test  sets  and  then  feature  selection  was  performed  only 
on  the  training  set.  This  method  will  result  in  a  pessimistic 
estimate  of  the  classifier  performance  when  the  training  set  is 
small  [37].  However,  it  will  provide  a  more  conservative  but 
realistic  estimation  of  the  classifier  performance  in  the  general 
patient  population.  We  can  expect  that  the  performance  would 
be  improved  if  the  classifier  in  this  study  were  designed  using 
a  large  data  set.  Since  our  main  purpose  in  this  study  was 
to  compare  the  ART2LDA  classifier  with  the  commonly  used 
LDA  and  BPN,  we  did  not  attempt  to  quantify  how  pessimistic 
our  results  were  in  this  study. 

The  most  important  contribution  of  this  paper  is  to  in¬ 
troduce  a  new  approach  that  utilizes  a  two-stage  unsuper- 
vised-supervised  hybrid  classifier.  We  believe  that  the  hybrid 
approach  will  improve  classification  when  the  sample  distribu¬ 
tion  contains  subpopulations  that  may  be  difficult  for  a  single 
classifier  to  classify.  It  will  be  useful  for  similar  classification 
tasks  although  different  classifiers  may  be  used  in  each  stage 
of  the  hybrid  structure. 

VII.  Conclusion 

A  new  classifier  combining  an  unsupervised  ART2  and 
a  supervised  LDA  has  been  designed  and  applied  to  the 
classification  of  malignant  and  benign  masses.  A  data  set 


consisting  of  348  films  (179  malignant  and  169  benign) 
was  randomly  partitioned  into  training  and  test  subsets.  Ten 
different  random  partitions  were  generated.  For  each  training 
set,  texture  features  were  extracted  and  feature  selection  was 
performed.  An  average  of  features  were  selected  for  each 
group.  A  hybrid  ART2LDA  classifier,  an  LDA,  and  a  BPN 
were  trained  by  using  each  of  the  ten  training  sets.  The 
value  under  the  ROC  curve  for  the  test  sets,  averaged  over 
the  ten  partitions,  was  higher  for  ART2LDA  {A^  =  0.81) 
compared  to  those  of  the  LDA  alone  (A^  —  0.78)  and  of  the 
BPN  {Az  =  0.80).  A  greater  improvement  was  obtained  when 
the  partial  ROC  area  above  a  true-positive  fraction  of  0.9  was 
considered.  The  average  partial  A^  for  ART2LDA  was  0.34, 
as  compared  to  0.27  for  LDA  and  0.31  for  BPN.  Additionally, 
for  the  ART2LDA  classifiers  that  correctly  classified  the 
maximum  number  of  malignant  masses  in  the  test  sets  with 
decision  threshold  defined  with  the  training  set,  the  average 
partial  A^  was  0.35.  These  results  indicate  that  the  hybrid 
classifier  is  a  promising  approach  for  improving  the  accuracy 
of  classifiers  for  CAD  applications. 
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Analysis  of  interval  change  is  a  useful  technique  for  detection  of  abnormalities  in  mammographic 
interpretation.  Interval  change  analysis  is  routinely  used  by  radiologists  and  its  importance  is 
well-established  in  clinical  practice.  As  a  first  step  to  develop  a  computerized  method  for  interval 
change  analysis  on  mammograms,  we  are  developing  an  automated  regional  registration  technique 
to  identify  corresponding  lesions  on  temporal  pairs  of  mammograms.  In  this  technique,  the  breast  is 
first  segmented  from  the  background  on  the  current  and  previous  mammograms.  The  breast  edges 
are  then  aligned  using  a  global  alignment  procedure  based  on  the  mutual  information  between  the 
breast  regions  in  the  two  images.  Using  the  nipple  location  and  the  breast  centroid  estimated 
independently  on  both  mammograms,  a  polar  coordinate  system  is  defined  for  each  image.  The 
polar  coordinate  of  the  centroid  of  a  lesion  detected  on  the  most  recent  mammogram  is  used  to 
obtain  an  initial  estimate  of  its  location  on  the  previous  mammogram  and  to  define  a  fan-shaped 
search  region.  A  search  for  a  matching  structure  to  the  lesion  is  then  performed  in  the  fan-shaped 
region  on  the  previous  mammogram  to  obtain  a  final  estimate  of  its  location.  In  this  study,  a 
quantitative  evaluation  of  registration  accuracy  has  been  performed  with  a  data  set  of  74  temporal 
pairs  of  mammograms  and  ground-truth  correspondence  information  provided  by  an  experienced 
radiologist.  The  most  recent  mammogram  of  each  temporal  pair  exhibited  a  biopsy-proven  mass. 

We  have  investigated  the  usefulness  of  correlation  and  mutual  information  as  search  criteria  for 
determining  corresponding  regions  on  mammograms  for  the  biopsy-proven  masses.  In  85%  of  the 
cases  (63/74  temporal  pairs)  the  region  on  the  previous  mammogram  that  corresponded  to  the  mass 
on  the  current  mammogram  was  correctly  identified.  The  region  centroid  identified  by  the  registra¬ 
tion  technique  had  an  average  distance  of  2.8  ±1.9  mm  from  the  centroid  of  the  radiologist- 
identified  region.  These  results  indicate  that  our  new  registration  technique  may  be  useful  for 
establishing  correspondence  between  structures  on  current  and  previous  mammograms.  Once  such 
a  correspondence  is  established  an  interval  change  analysis  could  be  performed  to  aid  in  both 
detection  as  well  as  classification  of  abnormal  breast  densities.  ©  1999  American  Association  of 
Physicists  in  Medicine.  [80094-2405(99)00612-4] 

Key  words:  image  registration,  computer-aided  diagnosis,  computer  vision,  interval  change,  breast 
cancer 


L  INTRODUCTION 

Mammography  is  currently  the  most  effective  method  for 
early  breast  cancer  detection.^’^  A  variety  of  computer-aided 
diagnosis  (CAD)  techniques  have  recently  been  developed  to 
detect  mammographic  abnormalities  and  to  distinguish  be¬ 
tween  malignant  and  benign  lesions.^"*  Knowledge  from  di¬ 
verse  areas  such  as  signal  and  image  processing,  pattern  rec¬ 
ognition,  computer  vision,  artificial  intelligence,  and  neural 
networks  has  been  used  to  develop  algorithms  to  be  imple¬ 
mented  within  a  CAD  scheme.  Varying  degrees  of  success 
for  these  approaches  have  been  reported  in  the  literature.  One 
common  feature  of  most  of  these  CAD  techniques  is  that 
they  use  a  single  mammogram  for  analysis.  However,  some 
malignancies  may  only  manifest  as  a  new  density  on  mam¬ 
mograms  without  associated  calcifications  or  masses,  others 
distinguish  themselves  firom  benign  lesions  only  by  their 
relatively  rapid  changes  in  sizes.  Therefore,  radiologists  rou¬ 
tinely  use  several  mammographic  views  along  with  mammo¬ 


grams  obtained  in  previous  years  for  detecting  and  evaluat¬ 
ing  breast  lesions  and  for  identifying  interval  changes.  The 
importance  of  interval  change  analysis  in  mammographic  in¬ 
terpretation  has  been  established  in  clinical  practice.^’^®  It 
can  be  expected  that  analysis  of  changes  in  mammographic 
features  between  current  and  previous  mammograms  of  the 
patient  will  also  be  an  important  component  of  a  CAD  sys¬ 
tem  for  both  the  detection  and  the  classification  tasks.  The 
ability  for  automated  analysis  of  interval  changes  would  fur¬ 
ther  the  ability  of  CAD  to  offer  an  objective  second  opinion. 
This  improvement,  in  turn,  could  increase  the  positive  pre¬ 
dictive  value  of  manunography,  reduce  the  number  of  benign 
biopsies,  and  hence  reduce  both  cost  and  patient  morbidity. 

While  a  number  of  CAD  schemes  use  only  a  single  mam¬ 
mogram,  the  simultaneous  use  of  more  than  one  mammo¬ 
gram  has  been  under  investigation  for  some  time.  Several 
researchers  have  used  views  of  the  contra-lateral  breast  for 
detecting  masses  and  developing  densities.  For  instance,  Yin 
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have  utilized  architectural  asymmetry  between  the 
right  and  left  breasts  to  detect  masses.  While  it  is  widely 
accepted  that  interval  changes  in  mammographic  features  are 
very  useful  for  both  detection  and  classification  of  breast 
abnormalities,  the  development  of  CAD  techniques  to  use 
this  information  has  achieved  limited  success.^^'^®  Sallam 
and  Bowyer^^  have  proposed  a  warping  technique  for  mam¬ 
mogram  registration.  They  manually  obtained  control  points 
and  calculated  a  mapping  function  for  mapping  each  point  on 
the  current  mammogram  to  a  point  on  the  previous  mammo¬ 
gram.  The  mapping  function  was  obtained  based  on  local 
affine  transformations,  as  well  as  interpolation  and  surface 
fitting  techniques.  A  drawback  of  this  technique  is  the  need 
for  manual  demarcation  of  control  points.  Brzakovic  et 
have  investigated  a  three-step  method  for  comparison  of 
most  recent  and  previous  mammograms.  They  first  registered 
two  mammograms  using  the  method  of  principal  axis,  and 
partitioned  the  current  mammogram  using  a  hierarchical 
region-growing  technique.  The  breast  regions  in  the  two 
mammograms  were  aligned  with  respect  to  each  other  by 
means  of  translation,  rotation,  and  scaling.  Although  the 
technique  was  evaluated  on  a  total  of  64  images  obtained 
from  eight  cases,  this  work  mainly  aimed  toward  detecting 
cancerous  changes  in  breast  tissue  and,  therefore,  no  quanti¬ 
tative  analysis  of  registration  accuracy  was  presented.  Vujo- 
vic  and  co-workers  have  proposed  a  multiple-control- 
point  technique  for  mammogram  registration.  They  first 
determined  several  control  points  independently  on  the  cur¬ 
rent  and  previous  mammograms  based  on  the  intersection 
points  of  prominent  anatomical  structures  in  the  breast.  A 
correspondence  between  these  control  points  was  established 
based  on  a  search  in  a  local  neighborhood  around  the  control 
point  of  interest.  In  a  more  recent  publication,^^  they  have 
evaluated  their  approach  for  establishing  the  correspondence 
between  control  points  extracted  from  two  mammograms  us¬ 
ing  29  temporal  image  pairs,  and  presented  a  qualitative 
evaluation  based  on  an  observer  study.  They  have  demon¬ 
strated  that  91%  of  103  computer-matched  control  points 
were  in  agreement  with  those  matched  by  a  radiologist.  An 
important  assumption  of  their  work  was  that  the  distances 
between  the  control  points  did  not  change  significantly  be¬ 
tween  the  two  mammograms.  However,  this  assumption  is 
not  necessarily  a  valid  one.  Variations  in  compression  could 
potentially  cause  a  large  variation  in  the  relative  distances 
between  the  control  points.  Furthermore,  the  control  points 
representing  the  intersections  of  elongated  structures  do  not 
always  have  correspondences  on  the  two  mammograms. 
Most  of  these  points  are  two-dimensional  projection  image 
of  structures  at  different  depths  of  an  elastic  and  compress¬ 
ible  three-dimensional  breast.  The  projected  intersection 
points  can  thus  vary  from  image  to  image  and  are  not  invari¬ 
ant  lankmarks.  As  noted  by  the  authors,  the  potential  control 
points  are  not  points  that  are  naturally  selected  by  a  radiolo¬ 
gist  when  examining  mammograms.  Hence,  the  significance 
of  these  points  is  debatable. 

An  important  factor  that  may  limit  the  success  of  the 
above-mentioned  techniques  is  that  the  extraction  of  any 
meaningful  information  from  previous  mammograms  first  re- 
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quires  a  common  frame  of  reference  between  the  current  and 
previous  mammograms.  Several  complicating  factors  con¬ 
found  obtaining  such  a  frame  of  reference.  These  factors 
include  differences  in  breast  compression  and  positioning  be¬ 
tween  the  current  and  previous  mammograms,  differences  in 
the  imaging  technique  between  the  two  examinations,  and 
changes  in  breast  structure,  size,  and  tissue  density  between 
the  two  images  with  patient  age.  As  a  result,  the  mammo¬ 
graphic  appearance  of  breast  tissue  on  the  current  and  previ¬ 
ous  mammograms  of  the  same  patient  may  vary  consider¬ 
ably.  Although  these  variabilities  have  not  been  quantified 
experimentally,  they  can  be  observed  easily  from  most  mam¬ 
mograms.  Conventional  registration  techniques  work  well 
for  applications  involving  rigid  objects.  Because  of  the  elas¬ 
ticity  of  the  breast  tissue,  the  absence  of  obvious  landmarks, 
and  the  large  variability  in  the  relative  positions  of  the  breast 
tissues  projected  onto  the  mammogram  from  one  examina¬ 
tion  to  the  other,  these  techniques  may  not  be  optimal  for 
registration  of  breast  images. 

In  mammographic  interpretation,  a  radiologist  routinely 
compares  the  current  mammogram  with  previous  mammo¬ 
grams  (if  available)  of  the  same  view  in  order  to  detect 
changes  in  mammographic  features.  For  example,  if  a  mass 
is  detected  in  the  current  mammogram,  the  radiologist 
searches  for  that  mass  in  the  previous  mammogram  to  deter¬ 
mine  if  this  is  a  new  or  developing  density.  If  the  corre¬ 
sponding  mass  is  found  on  the  previous  mammogram,  then 
the  radiologist  compares  the  current  and  previous  mass  size 
and  estimates  if  the  mass  has  increased  in  size.  To  facilitate 
these  comparisons,  we  plan  to  develop  automated  methods  to 
detect  the  interval  changes  as  a  part  of  a  computer-aided 
diagnostic  system.  As  a  first  step,  we  have  developed  a  novel 
method  for  automatic  registration  of  lesions  on  temporal 
pairs  of  mammograms.  In  our  approach,  the  computer  emu¬ 
lates  the  search  method  used  by  many  radiologists  for  finding 
corresponding  structures  on  mammograms.  The  method  aims 
at  registering  a  small  region  containing  a  suspected  mass  on 
the  most  recent  mammogram  of  the  patient  with  one  on  a 
mammogram  obtained  from  a  previous  year.  Our  regional 
registration  technique  involves  three  steps:  (1)  identification 
of  a  suspicious  structure  on  the  most  recent  mammogram,  (2) 
initial  estimation  of  the  location  on  a  previous  mammogram 
of  the  region  corresponding  to  the  suspicious  structure  and 
the  definition  of  a  search  region  which  encloses  the  object  of 
interest  on  the  previous  mammogram,  and  (3)  accurate  iden¬ 
tification  of  the  location  of  the  matched  object  within  the 
search  region.  After  the  two  matched  lesions  are  identified, 
their  characteristic  features  can  be  automatically  extracted 
and  interval  changes  estimated.  In  the  present  study,  we  fo¬ 
cused  on  the  development  and  the  evaluation  of  the  regional 
registration  technique,  rather  than  to  solve  the  entire  interval 
change  analysis  problem.  The  subsequent  steps  in  the  inter¬ 
val  change  analysis  are  beyond  the  scope  of  this  study. 

In  the  following  sections  we  will  provide  a  detailed  de¬ 
scription  of  our  regional  registration  technique  for  temporal 
registration  of  mammograms  and  the  results  of  a  quantitative 
evaluation  using  a  data  set  of  74  temporal  image  pairs.  Al¬ 
though  we  evaluated  a  semiautomated  version  of  the  tech- 
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Fig.  1.  Regional  registration  technique  for  determining  an  object  on  the 
previous  mammogram  which  corresponds  to  a  suspicious  object  on  the  most 
recent  or  current  mammogram. 


nique  in  this  preliminary  study,  it  can  be  fully  automated  by 
incorporating  a  nipple  detection  step  so  that  no  user  interac¬ 
tion  will  be  required. 

II.  MATERIALS  AND  METHODS 

A.  Regional  registration  and  mammogram 
correspondence 

As  the  term  indicates,  regional  registration  is  a  local 
rather  than  a  global  registration  technique.  It  is  a  multistep 
procedure  and  utilizes  computer-detected  objects  in  the  most 
recent  (hereafter  termed  current)  mammogram.  In  the  context 
of  this  paper,  a  current  mammogram  is  either  the  latest  mam¬ 
mogram  of  the  patient,  or  the  latest  mammogram  before  bi¬ 
opsy.  The  detected  objects  could  be  either  true  masses  (be¬ 
nign  or  malignant)  or  false  positives  (normal  breast 
structures).  Regional  registration  then  finds  a  matching  ob¬ 
ject  on  a  previous  mammogram.  The  three  major  steps  in 
regional  registration  are  illustrated  in  Fig.  1  and  details  of  the 
technique  are  described  below. 

In  the  first  step  of  regional  registration,  the  breast  region 
is  segmented  from  the  background  on  both  the  current  and 
the  previous  mammograms.  For  this  purpose  we  have  used  a 
breast  boundary  detection  algorithm  previously  developed  in 
our  laboratory.^^’^®  This  algorithm  could  successfully  track 
the  breast  boundaries  in  over  90%  of  the  1000  mammograms 
in  a  previous  study.  It  performed  reliably  on  all  the  images  in 
our  database.  After  extracting  the  breast  border  from  the 
mammogram,  the  location  of  the  nipple  is  estimated  on  both 
the  current  and  the  previous  mammograms.  Any  automated 
method^^’^^  can  be  used  for  finding  the  nipple  location.  How¬ 
ever,  in  this  study,  the  nipple  location  was  manually  identi¬ 
fied  by  a  radiologist  for  all  images  in  our  data  set.  The  breast 
border  and  the  nipple  location  now  form  the  basis  of  a  global 
breast  alignment  (GBA)  procedure  illustrated  in  Fig.  2.  Since 
the  sizes  and  the  orientations  of  the  two  images  could  vary 
between  the  current  and  previous  mammograms,  a  common 
frame  of  reference  is  needed.  The  GBA  procedure  has  been 


Fig.  2.  Global  breast  alignment  based  on  the  mutual  information  between 
the  two  breast  regions.  N^. — ^nipple  location  in  current  mammogram, 
Np — ^nipple  location  in  previous  mammogram,  N — ^nipple  location  for  both 
current  and  previous  mammograms  after  translating  them  to  the  common 
frame  of  reference.  The  previous  mammogram  is  rotated  until  the  mutual 
information  between  the  two  mammograms  is  maximized. 


devised  specifically  to  provide  such  a  frame  of  reference.  We 
first  define  a  new  frame  of  reference  with  the  nipple  location 
on  the  current  mammogram  (A^^.)  as  the  origin.  The  previous 
mammogram  is  translated  so  that  its  nipple  location  {Np) 
aligns  with  the  origin  in  the  common  frame  of  reference  as 
shown  in  Fig.  2.  Using  the  origin  as  the  pivot  point,  we  rotate 
the  previous  mammogram  to  align  the  breast  regions  in  the 
two  images. 

We  have  evaluated  two  different  methods  for  estimation 
of  the  optimum  rotation  angle.  The  first  method  is  based  on 
maximization  of  the  overlap  area,  and  the  second  method  is 
based  on  maximization  of  the  mutual  information  (MI)^^’^"^ 
between  the  two  segmented  breast  regions.  To  determine  the 
MI,  we  first  rescale  the  breast  portion  of  both  mammograms 
to  a  0-255  gray  scale.  For  a  given  rotation  angle  ft  the 
two-dimensional  (2D)  histogram  of  the  gray  levels 

for  the  corresponding  pixels  on  the  current  mammogram  and 
the  previous  mammogram  is  constructed.  Here  i  refers  to  the 
gray  level  on  the  current  mammogram  and  j  refers  to  the 
gray  level  on  the  previous  mammogram  rotated  by  an  angle 
ft  The  probability  density  of  the  gray  scale  co-occurrences  is 
estimated  from  the  2D  histogram  as 


heiij) 


(1) 


where  Q^i,j^255,  0^m,n^255.  The  mutual  information 
(MI^)  between  the  two  images  for  a  specific  rotation  angle  ft 
is  computed  as 


MI«=S/e(U)*log2 

ij 


feiiJ) 


(2) 
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Fig.  3.  Polar  coordinate  system  defined  using  the  nipple  location  and  the 
nipple-centioid  axis.  The  search  region  for  finding  a  matching  object  on  the 
previous  mammogram  is  shown  as  the  shaded  region. 


The  above-raentioned  procedure  is  repeated  for  several  rota¬ 
tion  angles  and  the  angle  0^^  which  provides  the  maximum 
mutual  information  is  chosen  for  global  breast  alignment  of 
the  previous  mammogram  and  the  current  mammogram. 
Note  that  while  the  area  overlap  method  for  GBA  uses  the 
binary  image  after  segmentation,  the  Ml-based  method  uses 
the  original  gray  scale  image.  The  effects  of  the  two  methods 
on  the  accuracy  of  regional  registration  will  be  discussed 
later  in  Sec.  IV.  Once  the  two  images  are  aligned  in  the 
common  frame  of  reference,  the  centroid  of  the  breast  region 
is  estimated,  and  the  nipple-centroid  axis  is  defined  for  both 
mammograms.  For  comparison  we  also  show  in  Sec.  HI  re¬ 
gional  registration  results  based  on  computing  the  centroids 
of  the  two  breast  regions  without  global  breast  alignment. 
The  nipple-centroid  axis  forms  the  basis  for  the  second  step 
of  regional  registration. 

In  the  second  step,  suspicious  regions  are  automatically 
segmented  from  the  breast  region  on  the  current  mammo¬ 
gram.  This  can  be  accomplished  by  using  a  density- weighted 
contrast  enhancement  (DWCE)  technique^^  previously  de¬ 
veloped  in  our  laboratory.  While  the  use  of  the  DWCE  tech¬ 
nique  is  not  critical  for  regional  registration,  it  does  help 
automate  the  entire  procedure.  Alternatively,  a  radiologist 
can  manually  identify  a  suspicious  object  or  a  region  of  in¬ 
terest  on  the  current  mammogram  and  the  regional  registra¬ 
tion  technique  can  be  used  to  identify  a  corresponding  region 
on  the  previous  mammogram.  Once  suspicious  objects  have 
been  identified  on  the  current  mammogram,  the  centroid  of 
each  object  is  estimated.  A  polar  coordinate  system  is  then 
defined  using  the  nipple  as  the  origin  and  the  nipple-centroid 
axis  as  the  0°  axis  on  both  images.  This  is  illustrated  in  Fig. 
3.  The  location  of  the  centroid  of  a  suspicious  object  on  the 
current  mammogram  is  determined  as  (r,  ^).  We  then  com¬ 
pute  two  scale  factors— the  radial  scale  factor  and  the 
angular  scale  factor  S2.  These  scale  factors  have  been  de¬ 
vised  to  provide  a  first-order  correction  for  factors  such  as 
breast  compression  differences  between  the  current  and  pre¬ 
vious  mammograms,  differences  in  image  magnification  and 
size,  and  changes  in  overall  breast  shape  between  the  two 
images.  The  radial  scale  factor  Si  is  estimated  as  the  ratio  of 
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the  nipple -centroid  distances  on  the  previous  and  current 
images.  The  angular  scale  factor  .^2  is  estimated  as  the  ratio 
of  the  angular  width  of  the  breast  on  the  previous  image  at 
radius  ^ ir  to  that  on  the  current  image  at  radius  r.  The  initial 
estimate  of  the  corresponding  location  of  the  suspicious  ob¬ 
ject  on  the  previous  mammogram  is  then  obtained  as 
{sir,S20). 

Using  the  initial  estimate  of  the  centroid  of  the  object  on 
the  previous  mammogram,  we  can  define  a  fan-shaped 
search  region  bounded  by  .yir±  ^  and  ^  as  illustrated  in 
Fig.  3.  The  object  found  on  the  current  mammogram  is  then 
used  as  a  template  to  search  for  a  matching  object  in  the 
search  region  on  the  previous  mammogram.  The  size  of  the 
search  region  (defined  by  5  and  e)  depends  on  the  variability 
between  mammograms  obtained  from  one  examination  to  the 
other.  Since  it  is  difficult  to  predict  the  variability  of  an  elas¬ 
tic  and  deformable  object  such  as  the  breast  by  any  analytical 
method,  we  have  determined  this  variability  experimentally 
from  the  mammograms  in  our  data  set.  The  variation  in  com¬ 
pression  can  cause  a  change  in  the  relative  locations  of  vari¬ 
ous  breast  structures  on  these  images  as  well  as  a  rotation  of 
the  breast  boundary  with  respect  to  the  fixed  image  coordi¬ 
nates.  By  relating  the  position  of  a  breast  structure  to  the 
corresponding  nipple-centroid  axis,  and  by  performing  a 
search  in  the  corresponding  search  region,  we  can  reduce  the 
effect  of  this  variability.  In  this  study  we  have  estimated  the 
size  of  the  search  region  required  to  enclose  all  correspond¬ 
ing  objects  on  the  previous  mammogram  using  ground  truth 
objects  identified  on  the  previous  mammograms  by  a  radi¬ 
ologist.  The  distance  of  the  initial  estimate  of  the  center  of 
the  search  region  from  the  centroid  of  the  ground  truth  object 
was  also  estimated. 

The  third  and  final  step  in  the  regional  registration  proce¬ 
dure  involves  a  systematic  search  to  identify  a  corresponding 
structure  within  the  fan-shaped  search  region  on  the  previous 
mammogram.  In  this  study  we  have  evaluated  two  different 
search  criteria.  The  first  criterion  is  based  on  gray  scale  tem¬ 
plate  matching.  A  rectangular  gray  scale  template  centered 
on  the  mass  centroid  is  extracted  from  the  current  mammo¬ 
gram.  The  choice  of  the  size  of  the  template  region  can  affect 
the  accuracy  of  the  registration  technique.  The  minimum  re¬ 
quired  size  of  a  rectangular  template  is,  of  course,  a  rectan¬ 
gular  region  which  encloses  the  mass  exactly.  However,  one 
can  also  include  a  small  portion  of  the  background  region  in 
the  template.  We  have  analyzed  the  performance  of  our  al¬ 
gorithm  using  two  different  sizes  for  this  template.  The  first 
includes  a  1-pixel-wide  background  region  all  around  the 
boundary  of  the  suspicious  object  while  the  second  includes 
a  5-pixel-wide  background  region.  For  each  pixel  (ij)  in  the 
fan-shaped  region  on  the  previous  mammogram,  a  region  of 
interest  (ROI)  centered  on  the  pixel  and  of  the  same  size  as 
the  mass  template  is  extracted.  We  denote  the  (m,n)th  pixel 
in  the  gray  scale  template  extracted  from  the  current  mam¬ 
mogram  as  p(m,n)  and  that  from  the  ROI  obtained  from  the 
fan-shaped  region  as  A  correlation  measure  de¬ 
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is  then  obtained  for  each  pixel  (ij)  within  the  search  region 
on  the  previous  mammogram.  Here  the  summation  is  per¬ 
formed  over  the  mass  template,  and  p  and  q  denote  the  av¬ 
erage  pixel  values  in  the  template  and  ROI,  respectively.  The 
correlation  values  in  the  search  region  are  then  smoothed  by 
a  3X3  averaging  kernel  to  reduce  fluctuations.  The  final 
estimate  of  the  location  of  the  mass  centroid  on  the  previous 
mammogram  is  obtained  as  the  location  corresponding  to 
maximum  correlation.  The  second  search  criterion  is  based 
on  maximizing  the  mutual  information  between  the  mass 
template  and  the  ROI  extracted  from  within  the  search  re¬ 
gion.  The  MI  approach  is  similar  to  that  described  earlier  for 
alignment  of  the  breast  regions,  except  that  the  regions  to  be 
matched  are  limited  to  the  size  of  the  mass  template. 

Once  a  corresponding  structure  is  found  on  the  previous 
mammogram  for  a  suspicious  object  on  the  current  mammo¬ 
gram,  it  can  be  used  for  an  interval  change  analysis  within  a 
CAD  scheme,  as  we  have  shown  in  an  independent  study 
If  the  search  procedure  in  the  fan-shaped  region  does  not 
yield  a  corresponding  region,  then  the  suspicious  object  on 
the  current  mammogram  can  be  considered  as  a  newly  de¬ 
veloped  density.  Objects  for  which  no  corresponding  object 
can  be  found  on  the  previous  mammogram  can  be  analyzed 
with  methods  designed  for  single  images  in  an  overall  CAD 
scheme.  Note  that  in  this  study  the  search  techniques  are 
structured  in  a  way  to  always  determine  a  matching  object. 
Search  criteria  to  identify  new  densities  will  be  developed  in 
future  studies. 

B.  Image  acquisition  and  data  set 

The  data  set  for  this  study  consisted  of  127  images  ob¬ 
tained  from  the  files  of  34  patients  who  had  undergone  bi¬ 
opsy  at  the  University  of  Michigan.  From  these  127  mam¬ 
mograms,  74  temporal  pairs  of  images  were  obtained.  The 
current  mammogram  of  each  temporal  pair  exhibited  a 
biopsy-proven  mass.  AU  previous  mammograms  in  the  74 
temporal  pairs  contained  a  mass,  a  structure,  or  a  density 
which  the  radiologist  could  match  to  the  mass  detected  in  the 
corresponding  current  image.  Since  some  patient  files  con¬ 
tained  a  sequence  of  mammograms  over  three  years,  the 
number  of  temporal  pairs  was  larger  than  half  the  number  of 
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images.  The  74  temporal  image  pairs  were  comprised  of  43 
cranio-caudal  views  and  31  mediolateral-oblique  views. 

The  mammograms  of  20  temporal  pairs  were  digitized 
with  a  LUMISYS  DIS-1000  laser  scanner  at  a  pixel  resolu¬ 
tion  of  0.1  mmX  0.1  mm  and  with  12  bit  resolution.  The  digi¬ 
tizer  was  calibrated  so  that  the  gray  values  were  linearly  and 
inversely  proportional  to  the  optical  density  (OD)  within  the 
range  of  0.1 -2.8  OD  units,  with  a  slope  of  0.001  OD/pixel 
value.  Outside  this  range,  the  slope  of  the  calibration  curve 
decreased  gradually.  The  OD  range  of  this  digitizer  was 
0-3.5.  The  mammograms  of  the  remaining  54  temporal  pairs 
were  digitized  with  a  LUMISCAN  85  laser  scanner  at  a  pixel 
resolution  of  0.05  mm  X  0.05  mm  and  with  12  bit  resolution. 
This  digitizer  was  calibrated  so  that  the  gray  values  were 
linearly  and  inversely  proportional  to  the  OD  within  the 
range  0-4  OD  units,  with  a  slope  of  —0.001  OD/pixel  value. 
All  images  were  subsequently  reduced  to  0.8  mm  resolution 
by  averaging  adjacent  8X8 pixels  (20  pairs)  or  16 
X  16  pixels  (54  pairs).  Since  the  same  digitizer  was  used  for 
digitizing  all  films  of  the  same  case,  the  differences  in  the 
digitizers  would  have  no  effect  on  the  analysis  of  each  image 
pair.  Given  the  small  differences  between  the  two  laser  digi¬ 
tizers  and  the  large  differences  in  the  imaging  technique  and 
in  the  breast  appearance  from  one  case  to  another,  it  could  be 
expected  that  the  use  of  cases  collected  with  the  two  different 
digitizers  would  not  affect  the  evaluation  of  the  registration 
technique. 

While  the  regional  registration  technique  can  be  used  for 
determining  a  corresponding  structure  or  region  for  any 
structure  (both  false  positives  and  masses)  in  the  breast,  in 
this  study  we  have  analyzed  its  accuracy  on  biopsy-proven 
masses  alone.  The  location  of  the  mass  on  the  current  mam¬ 
mogram  was  identified  by  an  MQSA-certified  radiologist  ex¬ 
perienced  in  breast  imaging.  The  radiologist  manually  iden¬ 
tified  the  corresponding  region  on  the  previous  mammogram 
and  the  nipple  location  on  both  the  current  and  the  previous 
mammograms  using  an  interactive  image  analysis  tool  on  a 
UNIX  workstation.  For  each  current  mammogram,  the 
boundary  of  the  mass  was  manually  delineated  by  the  radi¬ 
ologist  using  an  image  display  program  developed  in  our 
laboratory.  A  bounding  box  enclosing  the  corresponding  ob¬ 
ject  on  the  previous  mammogram  was  provided  by  the  radi¬ 
ologist  for  each  of  the  masses.  Each  mass  as  well  as  the 
corresponding  structure  on  the  previous  mammogram  was 
rated  for  its  visibility  on  a  scale  of  1-10,  where  the  rating  of 


Size  of  mass  in  current  mammogram  (mm) 


Fig.  4.  Distribution  of  the  size  of  the 
mass  on  the  current  mammogram  with 
respect  to  the  size  of  the  correspond¬ 
ing  structure  on  the  previous  mammo¬ 
gram  as  estimated  by  an  experienced 
breast  radiologist  for  benign  (B)  and 
malignant  (M)  cases  in  the  data  set. 
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Visibility  in  current  mammogram 


Fig.  5.  Distribution  of  the  visibility  of  the  mass  on  the 
current  mammogram  with  respect  to  the  visibility  of  a 
corresponding  structure  on  the  previous  mammogram 
as  rated  by  an  experienced  breast  radiologist  for  benign 
(B)  and  malignant  (M)  cases.  In  this  rating  scale  the 
visibility  of  the  masses  decreases  from  1  to  10  with  10 
being  the  least  visible.  The  total  number  of  points  in 
these  two  graphs  is  less  than  the  total  number  of  mam¬ 
mogram  pairs  in  our  database,  because  mammogram 
pairs  with  the  same  rating  appear  as  a  single  point. 


I  corresponded  to  the  most  visible  category.  The  size  of  the 
mass  on  the  current  mammogram  as  well  as  the  size  of  the 
corresponding  structure  on  the  previous  mammogram  was 
also  provided  by  the  radiologist.  For  previous  mammograms 
on  which  the  radiologist  could  not  identify  a  distinct  mass, 
the  “mass”  size  was  given  a  size  of  0  mm.  The  parenchymal 
density  was  rated  based  on  the  BIRADS  lexicon.  The  distri¬ 
butions  of  the  size  and  visibility  ratings  for  benign  and  ma¬ 
lignant  cases  in  this  data  set  are  shown  in  Figs.  4  and  5. 

C.  Evaluation  of  registration  accuracy 

The  bounding  box  enclosing  the  corresponding  object  on 
the  previous  mammogram  provided  by  the  radiologist  was 
used  as  the  “ground  truth”  to  evaluate  the  accuracy  of  the 
regional  registration  technique.  We  have  used  two  different 
measures  for  assessing  registration  accuracy.  The  first  mea¬ 
sure  quantifies  whether  the  corresponding  region  is  correctly 
identified  by  the  registration  algorithm.  This  measure  is  com¬ 
puted  simply  as  the  number  of  cases  in  which  the  estimated 
centroid  location  of  the  mass  on  the  previous  mammogram  is 
inside  the  bounding  box  provided  by  the  radiologist.  The 
second  measure  quantifies  the  error  in  the  estimate  of  the 
corresponding  region  on  the  previous  mammogram  and  is 
defined  as  the  Euclidean  distance  between  the  estimated  cen¬ 
troid  of  the  corresponding  region  and  the  center  of  the 
bounding  box  provided  by  the  radiologist.  Together  these 
two  measures  answer  the  questions:  (a)  does  regional  regis¬ 


Fig.  6.  Left — ^most  recent  or  current  mammogram.  Right — previous  mam¬ 
mogram.  The  breast  images  are  superimposed  with  the  breast  borders  de¬ 
tected  by  a  breast  boundary  tracking  algorithm. 


tration  work?  (b)  how  well  does  the  technique  perform  in 
matching  structures  between  the  current  and  previous  mam¬ 
mograms?  In  Sec.  in  we  provide  the  results  of  regional  reg¬ 
istration  with  and  without  global  breast  alignment  and  using 
both  correlation  and  mutual  information  as  the  search  crite¬ 
rion  in  step  3. 

III.  RESULTS 

To  provide  the  reader  with  a  qualitative  idea  of  algorithm 
performance  we  first  illustrate  the  intermediate  results  at 
various  stages  of  the  algorithm.  Then  the  results  of  each  of 
the  three  steps  of  the  algorithm  are  presented  with  an  analy¬ 
sis  of  the  dependence  of  the  performance  on  various  algo¬ 
rithm  parameters.  Also  presented  is  an  analysis  of  the  accu¬ 
racy  of  regional  registration  using  the  error  measures  defined 
in  Sec.  EC.  In  the  following  sections,  the  term  “initial  esti¬ 
mate”  refers  to  the  estimate  of  the  center  of  the  search  re¬ 
gion  in  step  2  of  regional  registration.  The  term  “final  esti¬ 
mate’  ’  refers  to  the  outcome  of  the  search  procedure  adopted 
in  step  3  and  represents  the  overall  result  of  regional  regis¬ 
tration. 

A.  Intermediate  results  of  regional  registration 

Figures  6-8  show  an  example  of  the  intermediate  and 
final  results  of  applying  the  regional  registration  technique  to 
a  temporal  pair  of  mammograms.  The  original  digitized 
mammograms — current  and  previous — with  the  automati- 


Fig.  7.  Left — location  of  the  mass  on  the  cunenl  mammogram.  Right — 
radiologist-identified  region  on  previous  mammogram  corresponding  to  the 
mass  on  the  current  mammogram. 
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Fig.  8.  The  fan-shaped  search  region  on  the  previous  mammogram.  The 
initial  computer  estimate  of  the  centroid  location  of  the  region  correspond¬ 
ing  to  the  mass  is  at  the  center  of  the  search  region.  The  final  estimate  of  the 
centroid  of  the  corresponding  region  (indicated  by  X)  is  obtained  by  using 
the  correlation  criterion  within  the  fan-shaped  search  region. 

cally  tracked  breast  boundaries  superimposed,  are  shown  in 
Fig.  6.  The  location  of  the  mass  on  the  current  mammogram 
is  shown  in  Fig.  7  along  with  the  corresponding  radiologist- 
identified  region  on  the  previous  mammogram.  Figure  8 
shows  the  fan-shaped  search  region  on  the  previous  mammo¬ 
gram  estimated  in  step  2  of  regional  registration.  The  initial 
estimate  is  at  the  center  of  this  search  region  which  is  to  be 
used  in  step  3  for  localization  of  the  corresponding  mass. 
The  centroid  location  of  the  corresponding  object  estimated 
by  the  algorithm  using  the  correlation  measure  as  the  search 
criterion  is  also  shown  in  Fig.  8. 

B.  Initial  estimates  and  search  regions 

Figure  9  shows  histograms  of  the  Euclidean  distance  be¬ 
tween  the  initial  estimate  of  the  centroid  location  of  the  cor¬ 
responding  structure  on  the  previous  mammogram  and  the 
center  of  the  bounding  box  provided  by  the  radiologist.  For 
the  74  temporal  image  pairs  used  in  this  data  set,  the  average 
Euclidean  distance  error  of  the  initial  estimate  was  10.5  mm 
(std.  dev.  6.4  mm)  without  the  GBA  procedure  and  9.8  mm 
(std.  dev.  6.0  mm)  with  the  GBA  procedure.  The  overall 
accuracy  was  46%  in  both  cases,  i.e.,  in  34  of  the  74  tempo¬ 
ral  image  pairs  the  initial  estimate  was  inside  the  ground- 
truth  bounding  box.  Based  on  observation  of  the  radial  de¬ 
viation  errors  and  the  angular  deviation  errors  (defined  in 
Sec.  IV)  in  Figs.  10  and  11,  a  search  region  defined  by  € 
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=0.35+ 5/r  rad  and  S=20  mm  with  GBA  (5=25  mm  for  no 
GBA),  where  r  is  the  radial  distance  from  the  nipple,  was 
used  for  the  evaluation  of  the  local  search  criteria  used  in 
step  3  of  regional  registration. 

C.  Local  search  criteria  and  final  estimates 

Figure  12  shows  the  histograms  of  the  Euclidean  distance 
errors  of  the  final  estimate  of  the  corresponding  structure 
using  the  correlation  measure  as  the  search  criterion.  Table  I 
summarizes  the  results  along  with  the  average  Euclidean  dis¬ 
tance  errors  and  standard  deviations  using  both  the  correla¬ 
tion  and  the  mutual  information  search  criteria  and  with  and 
without  the  GBA  procedure.  The  average  Euclidean  distance 
errors  and  deviations  for  the  cases  where  the  final  estimate  is 
inside  the  ground-truth  region  identified  by  the  radiologist 
and  the  cases  where  it  is  outside  are  also  listed  separately. 
Regional  registration  incorporating  the  GBA  procedure  and 
using  correlation  as  a  search  criterion  has  an  accuracy  of 
85%.  In  63  of  the  74  temporal  image  pairs,  the  final  estimate 
of  the  location  of  the  corresponding  region  was  inside  the 
radiologist-identified  ground-truth  region.  The  use  of  mutual 
information  as  a  search  criterion  yielded  an  accuracy  of  74% 
(55  out  of  74  temporal  pairs).  The  average  Euclidean  dis¬ 
tance  error  for  regional  registration  incorporating  GBA  and 
correlation  was  4.7  mm  (std.  dev.  5.8  mm)  for  all  74  tempo¬ 
ral  pairs  and  2.8  mm  (std.  dev.  1.9  mm)  in  85%  (63/74)  of 
the  temporal  pairs.  Use  of  mutual  information  as  a  search 
criterion  in  step  3  results  in  values  of  7.2  mm  (std  dev.  8.6 
mm)  and  3.0  mm  (std.  dev.  2.0  mm),  respectively,  for  the 
same  quantities. 

IV.  DISCUSSION 

A.  Initial  estimates  and  search  regions 

From  the  histograms  of  Fig.  9,  we  observe  that  the  use  of 
the  GBA  procedure  results  only  in  a  marginal  improvement 
in  the  initial  estimate,  if  the  Euclidean  distance  error  is  the 
only  measure  considered.  However,  the  GBA  procedure  has 
a  significant  effect  in  reducing  the  size  of  the  search  region 
required  for  regional  registration.  In  order  to  compute  the 
required  sizes  (5  and  e  in  Fig.  3)  of  the  search  region,  we 
computed  two  quantities — ^the  radial  distance  deviation  and 
the  angular  deviation — using  the  initial  estimate  obtained 
from  step  2  for  the  74  temporal  image  pairs.  The  radial  dis¬ 
tance  deviation  is  defined  as  the  absolute  difference  between 
s^r  and  where  is  the  radial  distance  of  the  center  of 
the  ground-truth  region  from  the  nipple  location  on  the  pre- 
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Fig.  9.  Histograms  of  Euclidean  dis¬ 
tance  between  the  initial  estimate  of 
the  centroid  location  of  the  corre¬ 
sponding  object  and  the  center  of  the 
radiologist-identified  object  on  the 
previous  mammogram  with  and  with¬ 
out  GBA. 
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Radial  deviation  error  (mm) 


Fig.  10.  Histograms  of  radial  distance 
deviation  between  the  initial  estimate 
of  the  centroid  location  of  the  corre¬ 
sponding  object  and  the  center  of  the 
radiologist-identified  object  on  the 
previous  mammogram  with  and  with¬ 
out  GBA. 


vious  mammogram.  The  histograms  of  radial  distance  devia¬ 
tions  for  the  74  temporal  image  pairs  with  and  without  the 
GBA  procedure  are  shown  in  Fig.  10.  An  important  obser¬ 
vation  is  that  a  S  value  of  25  mm  is  needed  to  include  the 
centers  of  the  ground-truth  structures  if  the  GBA  procedure 
is  not  used  in  step  1.  The  use  of  the  GBA  procedure  results 
in  a  decrease  in  the  value  of  <5  to  20  mm.  This  decrease  helps 
significantly  increase  the  overall  accuracy  of  the  regional 
registration  as  discussed  below. 

In  Fig.  11  the  angular  deviation  of  the  initial  estimate  is 
plotted  against  the  radial  distance  of  the  centers  of  the 
ground-truth  regions  on  the  previous  mammogram.  The  an¬ 
gular  deviation  cis  defined  as  S2O  0(.  where  0^.  is  the  angle 
between  the  nipple-ground-lruth  center  vector  and  the 
nipple-centroid  axis.  In  an  earlier  study”  using  both  false 
positives  and  masses,  we  have  observed  that  the  value  of  e 
needed  to  include  the  center  of  the  ground-truth  region  de¬ 
creases  with  distance  from  the  nipple,  i.e.,  increases  with 


Fig.  11.  Angular  devi^on  between  the  initial  estimate  of  the  centroid  loca¬ 
tion  of  the  cones^nding  object  and  the  center  of  the  radiologist-identified 
objea  on  the  previous  mammogram  with  and  without  GBA.  Also  shown  ate 
the  bounding  lines  defined  using  e= 0.35-1 5/r  tad. 


distance  from  the  chest  wall.  This  may  be  attributed  to  the 
increased  deformability  of  the  breast  tissue  closer  to  the 
nipple  compared  to  the  tissue  closer  to  the  chest  wall.  This 
indicates  that  a  possible  approach  to  take  into  account  this 
variability  is  to  incorporate  a  variable  e,  one  which  is  in¬ 
versely  proportional  to  the  radial  distance  r  from  the  nipple. 
For  the  data  set  in  this  study,  we  have  investigated  several 
forms  for  this  dependence  all  of  which  fit  under  the  general 
model 

e=e^+Klr. 

Here  and  K  are  two  constants  which  affect  the  form  of  the 
dependency.  Based  on  our  observation  of  the  angular  devia¬ 
tions  for  the  entire  data  set  of  74  temporal  pairs  we  have 
chosen  6(1,=  0.35  rad  and  K=5  rad-mm.  As  can  be  seen  from 
Fig.  1 1,  with  these  values  of  fj,  and  K,  all  of  the  centers  of 
the  ground-truth  regions  are  within  the  search  region.  There¬ 
fore,  a  search  region  defined  by  e= 0.35-1- 5/r  rad,  and  S=20 
mm  (if  GBA  was  applied)  or  ^=25  mm  (if  GBA  was  not 
applied)  was  used  for  evaluation  of  the  local  search  criteria 
used  in  step  3  of  regional  registration. 


Rnal  Euclidean  distance  error  (mm) 


Fig.  12.  Histogr^  of  Euclidean  distance  error  for  corresponding  regions 
estimated  by  regional  registration  using  the  correlation  measure  in  step  3 
with  and  without  GBA.  This  error  is  defined  as  the  Euclidean  distance 
between  the  centroid  location  of  the  estimated  corresponding  region  and  the 
center  of  the  radiologist-idenUfied  ground-truth  corresponding  region  on  the 
previous  mammogram. 
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Table  I.  Accuracy  of  regional  registration  using  correlation  measure  and 
mutual  information  measure  in  step  3  with  and  without  global  breast  align¬ 
ment  (GBA)  and  using  a  1-pixel-wide  background  region  for  the  template 
from  the  current  mammogram.  Correct  estimates  are  the  cases  where  the 
estimated  centroid  location  was  within  the  bounding  box  of  the  radiologist- 
identified  object  location. 


Method 

Accuracy 

Overall  average 
error  (mm) 

Average 

error 
(mm)  for 
correct 

estimates 

Average 

error 
(mm)  for 
incorrect 

estimates 

Correlation 
without  GBA 

77%  (57/74) 

7.4+10.2 

2.8±2.0 

22.9±11.5 

Mutual 
information 
without  GBA 

68%  (50/74) 

8.8+10.5 

3.0±2.0 

20.7±11.1 

Correlation 
with  GBA 

85%  (63/74) 

4.7±5.8 

2.8±1.9 

15.7±8.3 

Mutual 
information 
with  GBA 

74%  (55/74) 

7.2±8.6 

3.0±2.0 

19.4±8.9 

B.  Local  search  criteria  and  final  estimates 

We  have  evaluated  the  use  of  correlation  and  mutual  in^ 
formation  as  the  local  search  criteria.  From  Table  I  we  ob¬ 
serve  that  the  GBA  procedure  results  in  a  higher  accuracy 
irrespective  of  the  search  criterion.  While  the  use  of  mutual 
information  as  a  search  criterion  performs  reasonably  well  by 
itself  (74%  accuracy  with  an  average  error  of  7.2  mm)  the 
use  of  correlation  measure  was  observed  to  result  in  more 
accurate  registration.  For  the  images  in  this  data  set,  the  cor¬ 
relation  measure  outperformed  the  mutual  information  mea¬ 
sure  irrespective  of  whether  the  breast  centroids  were  com¬ 
puted  with  or  without  the  GBA  procedure. 

A  few  observations  on  the  11  cases  where  the  final  esti¬ 
mate  was  outside  the  radiologist-identified  ground-truth  cor¬ 
responding  region  are  in  order.  In  7  of  the  1 1  cases  although 
the  radiologist  did  provide  a  region  corresponding  to  the 
mass  on  the  current  mammogram,  the  corresponding  struc¬ 
ture  on  the  previous  mammogram  was  very  subtle  (visibility 
rating  8  or  higher)  with  indistinct  boundaries.  The  radiologist 
could  only  estimate  the  region  where  the  mass  would  de¬ 
velop  rather  than  the  mass  itself,  so  the  truth  was  imcertain. 
In  one  of  the  remaining  4  cases,  the  mass  was  an  architec¬ 
tural  distortion  in  the  current  mammogram.  In  a  second  (be¬ 
nign)  case  the  mass  shape  had  changed  considerably.  Upon 
consultation  of  the  pathology  report,  the  radiologist  con¬ 
cluded  that  the  mass  was  a  benign  cyst  which  had  been  as¬ 
pirated  in  the  previous  year  resulting  in  a  substantial  change 
in  its  shape.  In  the  third  case,  the  proximity  of  the  mass  to 
the  chest  wall  resulted  in  it  being  incompletely  imaged  in  the 
previous  year  compared  to  the  current  year.  In  such  cases  the 
correlation  measure  of  a  neighboring  breast  structure  would 
tend  to  be  higher  than  that  of  the  corresponding  structure.  In 
the  fourth  case,  an  overlap  of  two  vessels  was  identified  as 
corresponding  to  the  mass  on  the  current  mammogram  while 
the  region  corresponding  to  the  mass  was  observed  to  be 
extremely  subtle.  In  almost  all  of  the  1 1  cases  the  proximity 
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of  the  corresponding  region  to  a  dense  structure  combined 
with  the  subtle  nature  of  the  structure  on  the  previous  mam¬ 
mogram  render  the  correlation  measure  ineffective  in  estab¬ 
lishing  correspondence.  However,  in  clinical  practice,  these 
masses  will  likely  be  categorized  as  a  newly  developed  den¬ 
sity.  Criteria  to  distinguish  a  newly  developed  density  will  be 
investigated  in  further  studies. 

C.  GBA:  Area  overlap  vs  mutual  information 

For  the  images  used  in  this  study,  the  result  of  the  GBA 
procedure  based  on  maximizing  the  area  overlap  between  the 
breast  regions  in  the  two  images  of  a  temporal  pair  is  com¬ 
parable  to  that  based  on  maximizing  the  mutual  information. 
However,  our  observation  is  that  the  mutual  information  cri¬ 
terion  is  preferable  to  the  area  overlap  criterion.  The  area 
overlap  measure  suffers  from  the  drawback  that  if  the  breast 
region  in  one  of  the  mammograms  is  uniformly  smaller  than 
that  in  the  other,  i.e.,  the  breast  edge  in  one  is  completely 
within  the  breast  edge  in  the  other,  then  there  is  no  unique 
rotation  angle  at  which  the  area  overlap  is  maximized.  Al¬ 
though  the  range  of  rotation  angles  over  which  local  maxima 
of  the  area  overlap  occur  is  small,  the  resulting  estimate  of 
the  rotation  angle  for  GBA  may  be  suboptimal.  The  use  of 
mutual  information,  however,  results  in  a  single  unique  rota¬ 
tion  angle  at  which  MI  is  maximized.  In  any  case,  as  dis¬ 
cussed  earlier,  the  use  of  the  GBA  procedure  before  comput¬ 
ing  the  breast  centroid  results  in  a  reduction  in  the  size  of  the 
search  region.  A  smaller  search  region  reduces  the  likelihood 
that  the  mass  template  is  matched  to  an  incorrect  structure 
and,  therefore,  increases  the  accuracy  and  reduces  the  Eu¬ 
clidean  distance  error. 


D.  Template  size,  scale  factors,  and  computation 
times 

The  size  of  the  background  region  in  the  gray  scale  tem¬ 
plate  extracted  from  the  current  mammogram  affects  regis¬ 
tration  accuracy.  For  the  74  temporal  pairs  in  this  data  set, 
the  best  performance  was  observed  when  a  1 -pixel- wide 
background  region  was  included  all  around  the  boundary  of 
the  mass  template.  A  5-pixel-wide  background  region  re¬ 
sulted  in  a  decrease  in  accuracy  and  an  increase  in  the  aver¬ 
age  Euclidean  distance  error.  The  accuracy  progressively  de¬ 
creased  and  the  Euclidean  distance  error  increased  with  an 
increase  in  the  size  of  the  background  region  in  the  template. 
Figure  13  shows  the  distributions  of  the  radial  and  angular 
scale  factors  for  the  images  used  in  this  study.  The  radial 
scale  factor  ranged  from  0.94  to  1.05  for  this  data  set.  Use 
of  reduced  the  size  of  the  search  area  by  decreasing  the 
required  value  for  S.  The  angular  scale  factor  ^2  was  very 
close  to  1  in  all  cases  and  did  not  seem  to  make  any  major 
difference  for  the  images  in  this  data  set.  On  a  final  note  the 
computation  time  required  for  regional  registration  incorpo¬ 
rating  correlation  was  on  the  average  2  s  without  GBA  and  4 
s  with  GBA  on  a  UNIX  workstation  (DEC  AlphaStation  600 
series). 
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Fig.  13.  Histograms  of  the  radial  scale 
factor  and  the  angular  scale  factor  for 
74  temporal  image  pairs.  The  radial 
scale  factor  is  estimated  as  the  ratio 
of  the  nipple-centroid  distances  on  the 
previous  and  current  images.  The  an¬ 
gular  scale  factor  ^2  is  estimated  as  the 
ratio  of  the  angular  width  of  the  breast 
on  the  previous  image  at  radius  Sir  to 
that  on  the  current  image  at  radius  r. 


V.  CONCLUSIONS 

Radiologists  are  interested  in  determining  any  local 
changes  in  breast  tissue  over  time  which  may  indicate  a  de¬ 
veloping  cancer.  We  have  developed  a  novel  regional  regis¬ 
tration  technique  for  temporal  registration  of  mammograms. 
This  technique  could  become  an  important  component  of  a 
CAD  scheme  for  mammographic  analysis.  Unlike  other  tech¬ 
niques  found  in  the  literature,  our  regional  registration  tech¬ 
nique  does  not  depend  on  the  identification  of  landmark 
structures  or  control  points  on  the  mammograms.  It  is  based 
on  a  search  technique  that  many  radiologists  use  and  has 
proven  to  be  successful  in  mammographic  interpretation.  Af¬ 
ter  corresponding  objects  are  found,  they  can  be  analyzed  for 
interval  changes  in  a  CAD  scheme.  Our  preliminary  results 
indicate  that  the  regional  registration  technique  is  promising 
in  identifying  corresponding  regions  from  temporal  mammo- 
p'aphic  pairs.  In  85%  (63/74)  of  the  cases  the  regional  reg¬ 
istration  techmque  correctly  identified  the  corresponding  re¬ 
gion  in  the  previous  mammogram.  For  these  63  cases,  it  is 
highly  encouraging  to  note  that  the  estimated  location  of  the 
region  corresponding  to  the  mass  in  the  current  mammogram 
was  less  than  3  mm  on  the  average  from  radiologist- 
identified  corresponding  locations. 

Areas  for  future  work  include  the  development  of  an  au¬ 
tomated  technique  for  identifying  the  nipple  location  on  the 
mammograms,  investigation  of  other  local  search  criteria 
such  as  Fourier  descriptors  and  shape-invariant  moments  to 
be  used  in  the  fan-shaped  search  region,  adaptive  methods 
for  determining  the  size  of  the  search  region,  criteria  for 
identifying  newly  developed  densities,  application  of  re¬ 
gional  registration  to  false  positives  as  well  as  masses,  and 
studies  with  a  large  data  set  to  investigate  the  robustness  of 
the  regional  registration  technique.  It  may  be  noted  that  the 
regional  registration  technique  may  also  be  applicable  to 
other  related  registration  problems,  such  as  the  registration  of 
left  and  right  mammograms. 
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As  an  ongoing  effort  to  develop  a  computer  aid  for  detection  of  masses  on  mammograms,  we 
recently  designed  an  object-based  region -growing  technique  to  improve  mass  segmentation.  This 
segmentation  method  utilizes  the  density-weighted  contrast  enhancement  (DWCE)  filter  as  a  pre¬ 
processing  step.  The  DWCE  filter  adaptively  enhances  the  contrast  between  the  breast  structures 
and  the  background.  Object-based  region  growing  was  then  applied  to  each  of  the  identified  struc¬ 
tures.  The  region-growing  technique  uses  gray-scale  and  gradient  information  to  adjust  the  initial 
object  borders  and  to  reduce  merging  between  adjacent  or  overlapping  structures.  Each  object  is 
then  classified  as  a  breast  mass  or  normal  tissue  based  on  extracted  morphological  and  texture 
features.  In  this  study  we  evaluated  the  sensitivity  of  this  combined  segmentation  scheme  and  its 
ability  to  reduce  false  positive  (FP)  detections  on  a  data  set  of  253  digitized  mammograms,  each  of 
which  contained  a  biopsy-proven  breast  mass.  It  was  found  that  the  segmentation  scheme  detected 
98%  of  the  253  biopsy-proven  breast  masses  in  our  data  set.  After  final  FP  reduction,  the  detection 
resulted  in  4.2  FP  per  image  at  a  90%  true  positive  (TP)  fraction  and  2,0  FPs  per  image  at  an  80% 

TP  fraction.  The  combined  DWCE  and  object-based  region  growing  technique  increased  the  initial 
detection  sensitivity,  reduced  merging  between  neighboring  structures,  and  reduced  the  number  of 
FP  detections  in  our  automated  breast  mass  detection  scheme.  ©  1999  American  Association  of 
Physicists  in  Medicine.  [50094-2405(99)00808-1] 

Key  words:  computer-aided  diagnosis,  digital  mammography,  breast  mass  detection,  density- 
weight  contrast  enhancement  region  growing 


I.  INTRODUCTION 

Mammographic  screening  has  proven  to  be  an  effective 
method  for  early  detection  of  breast  cancer.  Women  in  a 
regular  mammographic  screening  program  have  a  statisli- 
callv  signihcant  reduction  in  breast  cancer  mortality  when 
compared  to  women  not  in  such  a  program.*  In  addition, 
independent  double  reading  by  two  radiologists  has  proven 
to  signihcanlly  increase  the  sensitivity  of  mammographic 
screening.^  Therefore,  regular  screening  and  double  reading 
would  appear  to  be  a  sensible  approach  for  breast  cancer 
detection.  While  regular  screening  is  emphasized  in  health 
care  programs,  the  higher  cost  and  increased  workload  on  the 
radiologists  may  make  double  reading  by  two  radiologists 
impractical  m  a  general  screening  situation.  Computer-aided 
diagnosis  (CAD)  is  one  alternative  that  could  allow  a  large 
number  of  mammograms  to  be  double  read  by  a  single  radi¬ 
ologist  aided  by  the  computer.  This  technique  may  improve 
the  accuracy  of  both  detection  and  charactenzation  of  breast 
lesions. 

Many  researchers  have  been  interested  in  comp ulen zed 
analysis  of  mammograms^  and  a  number  of  groups  have  de¬ 
veloped  algorithms  for  automated  detection  of  breast  ma.sses. 
The  detection  of  spiculated  masses  has  been  of  particular 
importance  because  of  its  high  likelihood  of  malignancy. 
Karssemeijer  et  ai,^  Kobatake  et  ai,^  and  Kegelmeyer 
et  ai^  have  all  proposed  methods  for  detecting  spiculated 
masses  on  digitized  mammograms.  However,  since  a  number 


of  malignant  masses  are  not  spiculated,  other  groups  have 
tackled  the  general  problem  of  identifying  all  types  of  breast 
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masses  on  digitized  mammograms.  * 

Our  research  group  has  reported  on  a  method  for  auto- 

10  1 

matically  detecting  masses  on  digitized  mammograms. 

The  method  employed  multiple  stages  of  density-weighted 
contrast  enhancement  (DWCE)  segmentation.  The  DWCE 
segmentation  was  first  applied  to  the  full  mammogram,  and 
then  reapplied  to  local  regions  within  the  mammogram  to 
improve  object  border  definition.  A  final  object  splitting 
stage  was  employed  to  eliminate  merging  between  neighbor 
ing  or  overlapping  breast  structures.  False  positive  (FP)  re 
duciion  based  on  extracted  morphological  features  was  ap 
plied  after  each  segmentation  step  with  texture  analysis  usee 
as  a  final  arbitrator  between  masses  and  normal  structures 
The  segmentation  was  evaluated  on  168  digitized  mammo 
grams  and  it  achieved  a  performance  of  4,4  FPs  per  image  a 
a  90%  true  positive  (TP)  detection  fraction  and  2.3  FPs  pe 
image  at  an  80%  TP  detection  fraction.*^ 

Our  approach  to  mass  detection  has  been  to  first  identil 
all  significant  structures  within  the  breast  region  using  a  gh 
bal  segmentation  technique  and  then  refine  the  initial  obje^ 
borders  using  local  processing.  Finally,  we  differentiate  bt 
tween  true  masses  and  normal  structures  using  morpholog 
cal  and  texture  information.  Our  method  is  therefore  diffc 
ent  from  other  detection  algorithms  that  utilize  the  obje 
shape  information  for  initial  detection.  The  disadvantage  > 
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our  combined  global  and  local  detecuon  approach  is  that  a 
large  number  of  normal  structures  are  identified  in  the  initial 
stage.  This  can  lead  to  additional  FPs  if  the  classification  is 
suboptimal.  However,  the  advantage  of  this  approach  is  that 
it  can  identify  difficult  masses  since  the  initial  detecUon  is 
not  based  on  shape  information.  The  shape  informahon  is 
still  used  in  the  classification  stage  to  reduce  FPs. 

In  this  paper,  we  present  an  improved  version  of  our  two- 
stage  DWCE  segmentation  approach.  This  new  scheme  was 
designed  to  both  increase  specificity  and  reduce  the  overall 
complexity  of  the  segmentation.  A  primary  motivation  is  to 
develop  a  method  for  eliminating  the  merging  between 
neighboring  structures  in  the  local  DWCE  processing  step 
and  thus  improve  local  segmentation.  We  introduce  an 
object-based  region-growing  technique  to  perform  this  task. 
Improved  local  segmentation  serves  a  number  of  purposes. 
First,  it  improves  the  morphological  and  texture  information 
used  for  FP  reduction  as  well  as  eliminates  the  need  for  the 
shape-based  splitting  step.  It  also  enables  us  to  eliminate  two 
morphological  FP  reduction  steps.  This  significantly  reduces 
the  overall  complexity  of  the  detection  program  and  should 
lead  to  a  more  practical  implemenUtion  in  a  general  chnical 
setting.  In  this  paper,  we  summarize  the  intermediate  and 
overall  detection  performance  of  the  improved  mass  segmen¬ 
tation  algorithm  and  describe  some  of  its  limitations. 

II.  METHODS 
A.  Database 

The  clinical  mammograms  used  in  this  study  were  se¬ 
lected  from  the  files  of  patients  who  had  undergone  biopsy  at 
the  University  of  Michigan  Hospital.  The  mammograms 
were  acquired  with  American  College  of  Radiology  (ACR) 
accredited  mammography  systems.  Kodak  MinR/MRE 
screen/film  systems  with  extended  cycle  processing  were 
used  as  the  image  recorder.  The  mammography  systems  have 
a  0.3-mm  focal  spot,  a  molybdenum  anode.  0.03-mm  thick 
molybdenum  filter,  and  a  5:1  reciprocating  grid.  The  selec¬ 
tion  criterion  used  by  the  radiologists  was  simply  that  a 
biopsy-proven  mass  existed  on  the  mammogram.  The  data 
set  consisted  of  253  mammograms  from  102  patients,  and  it 
included  128  malignant  and  125  benign  masses.  Sixty-three 
of  the  malignant  and  six  of  the  benign  masses  were  judged  to 
be  spiculated  by  a  MQSA  approved  radiologist.  The  size  of 
the  masses  ranged  from  5  to  29  mm  (mean  size=12.5  mm), 
and  their  visibility  ranged  from  1  (obvious)  to  5  (subtle) 
(mean=2.1).  Figures  1  and  2  show  the  histograms  of  ma.ss 
size  and  mass  visibility  for  the  data  set.‘^  These  distributions 
characterize  the  difficulty  and  diversity  of  the  ca.scs  con- 
tained  in  the  data  set. 

The  mammograms  were  digitized  with  a  LUMISYS  DIS- 
1000  laser  film  scanner  with  a  pixel  size  of  100  fjon  and  12 
bit  gray  level  resolution.  The  gray  levels  were  linearly  pro¬ 
portional  to  optical  density  in  the  0.1  to  2.8  optical  density 
unit  (O.D.)  range.  The  slope  was  0.001  O.D./pixel  value.  The 
slope  gradually  fell  off  in  the  2.8  to  3.5  O.D.  range.'®  ''  A 
large  pixel  value  corresponds  to  a  low  optical  density  with 
this  digitizer. 
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MASS  SIZE  (MM) 

Fig  1 .  Histograms  of  mass  size  for  the  253  masses  contained  in  our  data  set. 
Mass  sizes  were  measured  as  the  largest  axis  of  the  mass  by  an  experienced 


The  location  and  extent  of  all  the  biopsy-proven  masses 
were  marked  on  the  original  films.  The  radiologist  then  iden¬ 
tified  both  the  centroid  of  the  lesion  and  the  smallest  bound¬ 
ing  box  containing  the  entire  lesion  using  an  interactive  im¬ 
age  manipulation  tool  on  a  workstation.  Both  procedures 
were  performed  using  the  original  marked  film  as  a  guide. 
The  lesion  centroid  was  used  to  identify  TP  detections  after 
the  morphological  FP  reduction  step.  If  a  segmented  object 
was  within  4  mm  of  the  mass  centroid,  it  was  considered  a 
TP.  All  other  segmented  objects  were  considered  as  FPs.  The 
final  free-response  receiver  operating  characteristic  (FROC) 
curves  following  texture-based  classification  used  the  more 
precise  mass  bounding  box  for  TP  identification.  A  region 
was  considered  a  TP  only  when  it  contained  more  than  50% 
of  the  mass  bounding  box. 
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Fig.  2.  Histograms  of  mass  subtlety  for  the  253  masses  contaioed  in  our  data 
set.  Mass  subtleties  were  rated  by  an  experienced  breast  radiologist  from  1 
(obvious)  to  5  (subtle). 
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Fig.  3.  Block  diagram  of  the  breast  mass  segmentation  scheme.  A  digitized 
mammogram  undergoes  DWCE  segmentation  followed  by  object-based  re¬ 
gion  growing  and  then  morphological  and  texture  classification.  The  perfor¬ 
mance  of  the  segmentation  scheme  was  evaluated  by  FROC  analysis. 


B.  Density-weighted  contrast  enhancement 
segmentation 

The  block  diagram  for  the  proposed  detection  scheme  is 
shown  in  Fig.  3.  Global  DWCE  segmentation  was  used  to 
identify  an  initial  set  of  breast  structures  on  the  digitized 
mammograms.  These  objects  were  then  used  as  seed  loca> 
tions  to  perform  gradienFbased  region  growing.  A  thorough 
description  of  the  DWCE  technique  can  be  found  in  the 
literature.’^  ’"  ***  Briefly,  the  DWCE  technique  employs  an 
adaptive  filter  to  enhance  the  local  contrast  and  thus  accen¬ 
tuate  mammugraphic  structures  in  an  image.  As  the  term 
implies,  the  parameters  of  the  enhancement  filter  are  based 
on  the  local  density  within  the  image  and  the  filter  is  applied 
to  the  image  on  a  pixel-by-pixel  basis.  The  filter  is  designed 
to  suppress  very  low  contrast  values,  to  emphasize  the  low  to 
medium  contrast  values  and  to  just  slightly  deemphasize  the 
high  contra.st  values.  The  effect  of  suppressing  the  extremely 
low  contrast  values  is  to  reduce  bridging  between  adjacent 
breast  structures.  Pixels  with  low  to  medium  contrast  values 
are  enhanced  so  that  more  subtle  structures  can  be  detected. 
Finally,  the  slight  deemphasis  of  the  high  contrast  structures 
is  included  to  provide  a  more  uniform  intensity  distnbution 
for  detected  structures.  After  contrast  enhancement, 
Laplacian-Gaussian  edge  detection  is  applied  and  all  en¬ 
closed  objects  are  filled  to  produce  a  set  of  detected  struc¬ 
tures  for  the  image.  The  DWCE  segmentation  is  applied  to 
mammograms  that  have  been  smoothed  and  subsampled 
from  their  onginal  100  fim  pixel  size  to  an  800  fin\  pixel 
resolution.’^’  The  DWCE  stage  has  been  found  to  be  effective 
in  detecting  most  breast  structures  including  a  significant 
portion  of  breast  masses.  However,  the  DWCE  borders  usu¬ 
ally  fall  well  inside  the  true  borders  of  an  object  and  a  sig¬ 
nificant  number  of  adjacent  structures  are  merged  into  single 
objects.  This  occurs  most  frequently  when  the  adjacent 
breast  structures  have  some  tissue  overlap. 


C.  Object-based  region-growing  segmentation 

1.  Initial  grayscale  region  growing 

Before  gradient-based  region  growing  was  applied,  an  ini¬ 
tial  set  of  seed  objects  was  identified.  This  was  accomplished 
by  first  identifying  all  local  maxima  in  the  original  gray-scale 
image  which  occurred  within  the  extent  of  the  DWCE  ob¬ 
jects.  Local  maxima  were  defined  using  the  ultimate  erosion 
technique  described  by  Russ.^^  In  simple  terms,  a  pixel  was  a 
local  maximum  if  and  only  if  its  value  was  at  least  as  large  as 
all  nearest  neighbor  pixel  values.  All  maxima  were  identified 
and  grown  into  larger  objects  by  a  simple  gray-scale  region 
growing  technique  as  follows.  Gaussian  smoothing  (cr=2.0) 
was  applied  to  the  gray-scale  image,  and  a  maximum  and  a 
minimum  pixel  value  threshold  were  specified  to  select  a 
range  of  acceptable  pixel  values.  The  thresholds  were  de¬ 
fined  as 

GP  =  1.01G^  (1) 

and 

G“”'=0.99G‘^,  (2) 

where  G^*"  was  the  pixel  value  of  the  Jth  maximum  and 
and  G™“^  were  the  maximum  and  minimum  pixel 
value  thresholds,  respectively.  All  pixels  within  a  radius  of 
20  pixels  from  a  maximum  location  and  with  a  pixel  value 
inside  the  defined  range  were  considered  to  be  part  of  the 
object.  This  was  repeated  for  all  maxima  within  an  image. 
Figures  4(a)“4(d)  show  an  original  gray-scale  image  and 
corresponding  images  with  the  DWCE  objects,  the  local 
maxima,  and  the  gray-scale  region-grown  objects  high¬ 
lighted.  The  expanded  objects  were  used  as  seeds  for  the 
gradient-based  region  growing,  described  below. 

2.  Gradient  images 

A  mammogram  at  200  yum  resolution  was  used  in  the 
gradient-based  region-growing  stage.  The  200  /zm  resolution 
image  was  obtained  by  averaging  2X2  pixels  from  the  origi¬ 
nal  image.  The  reduced  resolution  image  had  to  be  smoothed 
again  before  gradient  filtering  because  the  mammographic 
tissue  produced  gradients  not  only  within  individual  breast 
structures  but  also  throughout  the  background  portions  of  the 
image.  Figure  5(b)  shows  the  gradient  magnitude  image  re¬ 
sulting  from  vertical  and  horizontal  Sobel  filtering  applied  to 
the  200  fim  gray-scale  image  shown  in  Fig.  5(a).  It  clearly 
demonstrates  the  large  number  of  gradients  throughout  the 
image  and  the  difficulty  in  applying  object-based  region 
growing  without  additional  smoothing.  For  our  application, 
the  smoothing  needed  to  reduce  the  spurious  gradients  was 
accomplished  by  frequency- weighted  Gaussian  (FWG)  filter¬ 
ing.  Frequency-weighted  filtering  is  a  technique  in  which  all 
pixels  within  the  image  are  split  into  a  base  and  a  residual 
term.  The  residual  is  either  positive  or  negative.  This  tech¬ 
nique  produces  three  subimages  from  an  original  image,  F, 
where 

F=Ff+Fsub+  +  f^sub--  (3) 
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region  growing,  and  (0  Ihe  objects  remaining  after  morphological  FP  reduction 


The  first  filler  componeni.  Ff.  is  a  filtered  version  of  the 
ongtnal  image.  In  our  ease,  a  Gaussian  filter.  = 

=  10).  was  used.  The  second  and  third  images  are  the  posi¬ 
tive  and  negative  residual  images  of  F-Ff.  respectively. 
The  f,ub-  residual  is  nonzero  where  the  image  intensity  is 
larger  than  the  ItKal  background  and  Fjub-  is  nonzero  where 
the  imace  intensity  is  smaller  than  the  local  background.  For 
a  particular  image  pixel,  (x.y).  the  residual  images  are  de- 
fined  as 

j  F(x.y)-Fjr(x.y).  F(x.y)>Ff(x.y ). 

Tsub'lf  I  Q  otherwise. 

'  (4) 


and 


F^„i,-ix.y)  = 


I  Fix.}  )- F fix.}  ). 

\  0.  otherwise. 


F(.r.y)<Ff(x.y). 


(5) 


Two  FWG  filters  were  designed  for  sequentially  prtKe.ssing 
the  mammograms.  The  first  FWG  filtering  step  reduced  the 
gradients  within  the  breast  structures  and  produced  an  inter¬ 
mediate  image.  F  j .  which  had  the  form 

F|(F)=iFf(F)  +  iF,„b-(F). 


where  the  Ff  and  F,„^*  images  were  derived 
onginal  200  /jtm  resolution  gray-scale  image.  A  second  FWG 
filtenng  step  was  used  to  eliminate  gradients  in  the  breast 
background.  It  produced  image  Ft.  which  had  the  form 

F:(F,)  =  F,„b.(F,). 

where  the  F^ub*  'mage  was  derived  from  image  F, .  The 
result  of  applying  the  two  FWG  filters  to  the  ongtnal  mam¬ 
mogram  in  Fig.  5(a)  is  shown  in  Fig  5(c).  In  this  image,  a 
signilicant  amount  of  background  has  been  eliminated  and 
the  gradients  in  the  remaining  structures  have  been  reduced. 
Honzontal  and  vertical  Sobel  filters'^  were  then  applied  to 
image  F-.  and  the  magnitude  calculated  to  produce  a  gradient 
image  a.s‘  shown  in  Fig.  5(d).  Finally,  5x5  median  filtering 
was  used  to  produce  the  final  gradient  image  shown  in  Ftg. 
5(e).  This  image  was  used  in  the  gradient-based  region- 
growing  step. 

3.  Final  gradienUbased  region  growing 

Each  initially  grown  object  (described  in  Sec.  II C  1 )  was 
again  grown  by  applying  an  adaptive  technique  to  the  gradi¬ 
ent  image.  Fj.  described  in  Sec.  II C  2.  The  region-growing 
technique  was  based  on  the  work  of  Chang  and  Li  and  their 
adaptive  homogeneity  test  for  determining  the  similarity  be- 
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Fig.  5  Processing  steps  used  to  define  the  gradient  images:  (a)  the  onginaJ  mammogram  with  the  mass  location  identified;  (b)  the  gradient  magnitude  image 
obtained  from  honzontal  and  vertical  Sobel  filtering  of  the  onginal  mammogram:  (c)  the  image  resulting  from  FWG  filtering  of  the  original  mammogram;  (d) 
the  gradient  magnitude  image  resulting  from  horizontal  and  vertical  Sobel  ftltenng  of  the  FWG  image;  and  (e)  the  image  resulting  from  median  filtering  of 
the  gradient  magnitude  image 


tween  region.s.  We  have  modified  this  technique  to  perform 
object- based  region  growing.  For  a  mammogram,  the  corre¬ 
sponding  gradient  image  was  smoothed  using  a  Gaussian 
filter  (rr^Z.O).  A  cumulative  distribution  function  (CDF)  of 
pixel  values  was  then  calculated  from  the  smoothed  gradient 
image  for  each  object.  For  each  object,  the  pixel  value 
thresholds  were  defined  as 

Cp-{g:CDF,o(g)=I.O}  (8) 

and 

G7'  =  {,(;:CDF,o(^)  =  0.0}.  (9) 

where  g  was  a  pixel  value  and  CDF,  o(g )  was  the  cumulative 
pixel  value  di.stnhution  within  the  border  of  object  i  and  for 
initial  growing  iteration  0.  The  initial  growing  thresholds 
simply  correspond  to  the  maximum  and  minimum  pixel  val¬ 
ues  within  an  object.  Single-pixel  growing  was  performed  on 
all  objects  using  the  thresholds  for  each  individual  object  to 
define  a  range  of  acceptable  pixel  values,  in  this  context, 
single-pixel  growing  meant  growing  was  limited  to  only 
those  pixels  directly  connected  to  the  initial  border.  Once 
single-pixel  growing  was  applied  to  all  objects  within  the 
image,  the  thresholds  were  adjusted  and  a  second  iteration  of 
growing  was  performed.  Iterative  single-pixel  growing  was 

Medical  Physics,  Vol.  26,  No.  8,  August  1999 


employed  to  limit  the  influence  of  the  order  that  objects  were 
grown  within  an  image.  The  thresholds  used  for  the  iih  ob¬ 
ject  dunng  the  jth  growing  iteration  were  defined  as 

(10) 


and 


G 


maxf 


^:CDF„/^)=- 


(11) 


where  CDF,  y(g)  was  the  cumulative  pixel  value  distribution 
from  the  smoothed  gradient  image  within  the  current  borders 
of  object  i.  Single  pixel  growing  was  applied  to  all  objects 
within  the  image.  This  iterative  procedure  was  repeated  until 
no  more  connected  pixels  had  a  value  within  the  appropri¬ 
ately  defined  range.  Note  that  neighboring  objects  were  not 
allowed  to  merge  together  during  this  region-growing  stage 
so  that  growing  between  adjacent  objects  stopped  with  at 
least  a  one  pixel  gap  between  them.  Figures  4(d)  and  4(e) 
show  the  initial  seed  objects  and  the  final  gradient  grown 
objects  for  the  example  shown  in  Fig.  4(a). 
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Fig  6  Flowchart  of  the  FP  redurtion  scheme.  The  images  were  sep^ed 
into  ten  independent  groups.  Each  group  underwent  "'“'Pholopcal 
duction  with  the  nine  other  groups  used  for  classifier  training  The  redu^ 
obiects  were  recombined  and  stepwise  feature  selection  was  perforrr^d  The 
images  were  again  separated  into  the  ten  groups  and  each  group  underwent 
LDA  tewure  classification  again  using  the  nine  other  groups  for  classifier 
training  All  test  scores  were  then  recombined  and  final  FROC  analysis  was 
performed 


D.  False  positive  reduction 

The  DWCE  segmentation  and  region  growing  do  not  dif¬ 
ferentiate  masses  from  normal  tissues,  therelore.  a  large 
number  of  breast  suuctures  were  usually  detected  in  each 
mammogram.  Since  the  shape  and  texture  of  mass  objects,  in 
general,  should  be  different  from  those  of  normal  breast 
structures,  a  set  of  features  was  extracted  from  each  detected 
object  and  used  to  differentiate  between  the  detected  struc¬ 
tures.  The  feature  set  included  both  morphological  and  tex¬ 
ture  features.  These  features  were  then  used  in  a  sequential 
classification  scheme  to  reduce  the  number  of  FP  detections 
in  the  mammograms.  The  sequential  application  of  different 
classifiers  has  been  found  to  increase  classification 
accuracy.”  and  it  also  allows  more  computationally  inten¬ 
sive  classifiers  to  be  applied  to  as  few  objects  as  possible.  A 
flow  chart  depicting  the  general  approach  employed  for  FP 
reduction  is  shown  in  Fig.  6.  In  this  study,  morphological 
classification  was  initially  used  to  eliminate  objects  that  had 
shapes  significantly  different  from  breast  masses.  Texture 
features  were  then  computed  for  all  remaining  objects  and 
used  with  a  linear  classifier  as  a  final  arbiter  between  masses 
and  normal  structures.  The  following  sections  describe  the 
major  components  of  the  FP  reduction  scheme. 


1.  Morphological  feature-based  FP  reduction 

The  mammograms  were  partitioned  into  a  number  of  dif¬ 
ferent  groups  so  that  the  morphological  classifiers  could  be 
trained  and  tested  to  differentiate  masses  from  normal  struc¬ 
tures.  In  this  study,  the  253  mammograms  were  randomly 
partitioned  into  ten  independent  groups.  Each  matnmogram 
was  allowed  to  appear  in  only  one  group,  and  all  images 
from  the  same  patient  were  grouped  together.  The  goal  ^  the 
partitioning  was  to  have  approximately  the  same 
images  in  each  group  under  the  given  constraints.  Classifica¬ 
tion  of  the  objects  within  each  individual  group  was  per¬ 
formed  with  a  classifier  trained  using  the  objects  from  the 
nine  other  image  groups.  This  allowed  an  approximate  9:1 
training-to-test  ratio  for  morphological  classification.  By  ro¬ 
tating  the  test  group  through  all  ten  image  sets,  each  mam- 
mogram  served  as  a  test  case  once. 

Eleven  morphological  features  were  used  in  the  iniual  dit- 
ferentiation  of  the  detected  structures.  These  features  in¬ 
cluded  the  following  object-based  measures;  number  of  pe¬ 
rimeter  pixels,  area,  perimeter-to-area  ratio,  circulanty. 
rectangularity,  and  contrast.  In  addition,  five  normalized  ra¬ 
dial  length  (NRL)  features  introduced  by  Kilday  et  al.  were 
also  utilized.'*  They  included  the  NRL  mean  value,  standard 
deviation,  entropy,  area  ratio,  and  zero-crossing  count.  The 
definiuon  for  each  morphological  feature  can  be  found  in  the 
literanire.'®  They  are  also  included  in  Appendix  A  of  this 

paper.  . 

The  morphological  features  were  used  as  input  variables 

for  two  different  classifiers.  A  simple  threshold  classifier  was 
followed  by  a  linear  discriminant  analysis  (LDA)  classifier  in 
the  morphological  FP  reduction  step.  The  simple  threshold 
classifier  set  a  maximum  and  minimum  value  for  each  mor¬ 
phological  feature  based  on  the  maximum  and  minimum  fea¬ 
ture  values  found  from  the  breast  masses  in  the  data  set.  The 
LDA  classification  was  applied  to  all  objects  remaining  after 
threshold  classification.  The  LDA  classifier  is  a  linear  clas¬ 
sifier  based  on  Fisher’s  discriminant,  which  is  optimal  for  the 
two-class.  multivariate  normal.  equal  covariance 
problem.”  '”  The  LDA  classifier  was  trained  for  each  train¬ 
ing  set  and  applied  to  the  appropriate  test  set.  The  LDA 
classifier  produced  a  single  discriminant  score  for  each  ob¬ 
ject  in  the  test  set.  A  threshold  was  defined  as  the  maximum 
discnminant  score  of  the  masses.  This  threshold  was  applied 
to  the  test  set  to  further  differentiate  breast  masses  for  normal 
structures.  The  threshold  was  again  based  on  all  masses  in 
the  data  set  to  ensure  that  no  mass  would  be  lost  during  diis 
initial  stage.  Figure  4(0  shows  the  results  of  morphological 
FP  reduction  for  the  example  depicted  in  the  figure. 

2.  Texture  feature-based  FP  reduction 

Texture-based  classification  followed  the  morphological 
FP  reduction.  A  large  set  of  multiresolution  texture  features 
was  extracted  for  each  detected  object  in  the  mammogram. 
Stepwise  feature  selection  was  then  used  to  choose  the  most 
appropriate  set  of  features  for  linear  classification.  The  se¬ 
lected  features  were  subsequently  used  with  a  LDA  classifier 
to  produce  a  single  discriminant  score  for  each  detected  ob- 
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Table  I.  The  number  of  detected  masses  and  FPs,  the  single  stage  reduction,  the  mean  object  area  ^ind 

standard  deviation  of  the  object  areas  (  for  the  initial  stages  in  the  mass  detection  scheme.  Note  texture  FP 
reduction  followed  the  morphological  FP  reduction  stage. 


Stage 

TPs 

fraction 

PT^s/image 
(initial  stages) 

Reduction 

At  Area  (mm^) 

cTAna  (mm^) 

DWCE 

97% 

49.1 

33.6 

66.8 

Region  growing 

97% 

45.3 

0% 

52.4 

85.1 

Morph.  FP  reduction 

97% 

35.5 

22% 

51.9 

52.1 

ject.  The  overall  performance  of  the  detection  scheme  was 
then  evaluated  with  FROC  analysis.  The  texture-based  re¬ 
duction  scheme  has  been  documented  in  the  literature;  there¬ 
fore,  this  paper  will  only  summarize  the  important  compo¬ 
nents  of  the  texture  analysis  and  point  out  any  differences 
from  the  previously  described  techniques. 

Regions  of  interest  (ROIs)  containing  each  object  remain¬ 
ing  after  morphological  FP  reduction  were  extracted  from 
the  100  /xm  resolution  mammograms.  The  ROIs  had  a  fixed 
size  of  256X256  pixels  and  the  center  of  each  ROI  corre¬ 
sponded  to  the  centroid  location  of  a  detected  object.  The 
only  exception  was  when  the  object  was  located  near  the 
border  of  the  breast  and  a  complete  256X256  pixel  ROI 
could  not  be  defined.  In  this  case  the  ROI  was  shifted  until 
the  appropriate  edge  coincided  with  the  border  of  the  original 
mammogram. 

Global  and  local  multiresolution  texture  features,  based 
on  the  spatial  gray  level  dependence  (SOLD)  matrix, 
were  used  in  texture  analysis."^  An  element  of  the  SOLD 
matrix,  ij).  is  defined  as  the  joint  probability  that  gray 
levels  /  and  j  (Kcur  at  a  given  interpixel  separation  d  and 
direction  6.  In  this  study.  13  texture  measures  were  defined 
for  each  SOLD  matnx.  These  measures  were  correlation,  en¬ 
ergy.  entropy,  inertia,  inverse  difference  moment,  sum  aver¬ 
age,  sum  variance,  sum  entropy,  difference  average,  differ¬ 
ence  vanancc.  difference  entropy,  information  measure  of 
correlation  1.  and  information  measure  of  correlation  2.  The 
definition  for  all  texture  measures  can  be  found  in  the 
literature^'  and  are  included  in  Appendix  B  of  this  paper. 

TTie  wavelet  transform  with  a  four-coefficient  Daubechies 
kernel  was  u.sed  to  decompose  individual  ROIs  into  different 
scales.  For  global  texture  features,  four  different  wavelet 
scales,  14  different  interpixel  distances  and  2  different  angles 
were  used  to  produce  28  SOLD  matrices.  This  resulted  in 
364  global  multiresolution  texture  feature  for  each  ROI.  To 
further  describe  the  information  specific  to  the  mass  and  its 
surrounding  normal  tissue,  a  set  of  local  texture  features 
were  calculated  for  each  ROI.*^“^  Five  rectangular  subre¬ 
gions  were  segmented  from  each  ROI:  an  object  subregion 
defined  by  the  detected  object  in  the  center  and  four  periph¬ 
eral  regions  at  the  comers.  Eight  SOLD  (four  interpixel  dis¬ 
tances  and  two  angles)  and  a  total  of  208  local  features  were 
calculated  from  the  object  subregion  and  the  periphery.  They 
included  104  features  in  the  object  region  and  an  additional 
104  features  defined  as  the  difference  between  the  feature 
values  in  the  object  and  the  periphery. 

In  order  to  improve  the  generalization  of  the  texture  clas¬ 


sification,  stepwise  feature  selection  was  used  to  select  a  sub¬ 
set  of  feature  from  the  pool  of  572  global  and  local  features. 
Feature  selection  was  performed  using  texture  features  de¬ 
rived  from  the  ROIs  obtained  from  all  253  images.  A  total  of 
40  texture  features  were  selected  by  stepwise  feature  selec¬ 
tion.  Details  on  the  application  of  stepwise  feature  selection 
can  be  found  in  our  previous  publications.^*’^^ 

At  this  point  in  texture  classification,  the  mammograms 
were  again  divided  into  the  same  ten  partitions  as  described 
in  the  morphological  FP  reduction  step.  Texture  classifica¬ 
tion  was  performed  on  each  test  group  with  a  trained  LDA 
classifier  employing  the  selected  features.  The  training  was 
based  on  the  texture  features  derived  from  the  ROIs  in  the 
nine  other  image  groups.  The  test  scores  within  each  group 
were  combined  with  the  scores  from  the  other  groups  to  form 
a  complete  test  set  of  discriminant  scores. 

The  FROC  analysis  based  on  the  single  set  of  test  scores 
was  used  to  evaluate  the  overall  performance  of  the  segmen¬ 
tation  method.^^'^^ 

ill.  RESULTS 

The  number  of  TP  and  FP  detections  found  following  the 
DWCE,  region-growing,  and  morphological  FP  reduction 
stages  of  the  segmentation  algorithm  are  summarized  in 
Table  I.  The  DWCE  segmentation  identified  97%  of  the 
breast  masses.  Table  I  also  includes  the  reduction  percentage, 
the  mean  object  areas  (/XArea)  I^e  standard  deviations  in 
the  object  areas  (o-Area)  fo*"  these  initial  stages.  Table  II  sum¬ 
marizes  the  mass  type,  mass  size,  mass  subtlety,  and  the 

Table  II.  The  mass  type,  mass  size,  mass  subtlety,  and  mammographic 
iissue  density  for  the  mammograms  where  the  mass  was  not  identified  by 
the  iniuaJ  segmentation.  In  the  table,  B  dentihes  a  benign  lesion,  M  identi- 
fie.s  a  malignant  lesion,  the  subtlety  is  on  a  scale  of  1  (obvious)  to  5  (subtle), 
and  breast  density  uses  the  BIRADS  density  scale  of  I  (fatty)  to  4  (dense). 
Both  the  subtlety  and  density  rankings  were  performed  by  an  experienced 
brea.st  radiologist. 


Ma.ss  no. 

Type 

Size  (mm) 

Subtlety 

Breast  density 

I 

M 

6 

4 

1 

2 

B 

10 

2 

1 

3 

B 

14 

2 

2 

4 

B 

10 

2 

3 

5 

B 

10 

2 

3 

6 

B 

14 

2 

3 

7 

B 

12 

4 

4 

Average 

10.9 

2.6 

2.4 
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stiuccures  next  to  a  lower  contrast  benign  mass  (mass  4  in  Table 


overall  mammographic  tissue  density  for  the  seven  masses 
missed  dunng  the  initial  DWCE  segmentation  stage.  Figure 
7  shows  examples  of  the  cases  where  the  mass  was  misse 
during  the  DWCE  stage.  Figure  8  shows  example  images 


/ith  corresponding  gradient  and  object  images  fo^fases  th 
lad  problems  during  the  region-growing  stage.  This  figure 
ontains  an  example  where  the  mass  stopped  growing  before 
I  reach  the  correct  edge,  and  an  example  where  the  mass  was 


Fig.  8.  A  mammographic  case  con¬ 
taining  a  mass  that  stopped  growing 
before  it  reached  the  correct  edge  (a)- 
(c)  and  a  case  containing  a  mass  that 
was  split  into  two  pieces  dunng  grow¬ 
ing  (d)-(n  This  figure  includes  (a) 
and  (d)  the  original  mammograms 
with  the  mass  locations  identified,  (b) 
and  (e)  the  corresponding  gradient  im¬ 
ages,  and  (c)  and  (0  the  final  grown 
objects. 


Medical  Physics,  Vol.  26,  No.  8.  August  1999 


1650 


Petrick  et  aLi  Combined  adaptive  enhancement  and  re9ion-growing  segmentation  of  breast  masses 


1650 


0123456789  10 

NUMBER  OF  FPs/IMAGE 

Fig.  9.  The  training  and  test  FROC  curve  obtained  foUowing  LDA  classifi¬ 
cation  using  40  selected  texture  features.  The  training  scores  were  obtained 
by  averaging  the  nine  training  scores  from  each  detected  object.  The  FROC 
data  points  were  obtained  by  varying  the  discriminant  decision  threshold 
from  the  maximum  to  the  minimum  value. 


split  into  two  pieces  during  region  growing.  Finally,  Fig.  9 
show  the  FROC  training  and  test  performance  for  the  com¬ 
plete  segmentation  scheme.  A  summary  of  the  overall  per¬ 
formance  is  given  in  Tables  III  and  IV  for  a  number  of  dif¬ 
ferent  TP  detection  fractions.  The  test  performance  for  the 
combined  DWCE  and  region -growing  segmentation  tech¬ 
nique  at  a  90%  TP  detection  level  was  4.2  FPs  per  image  and 
2.0  FPs  per  image  at  an  80%  TP  level. 

IV.  DISCUSSION 

The  purpose  of  the  initial  DWCE  segmentation  stage  was 
to  have  a  method  sensitive  enough  to  identify  breast  masses 
but  which  also  limited  the  number  of  normal  structures  de¬ 
tected.  We  have  found  the  DWCE  segmentation  to  he  effec¬ 
tive  in  this  task.  In  this  study,  DWCE  segmentation  identified 
246  of  the  253  (97%)  masses  in  the  images.  Table  II  sum¬ 
marizes  the  properties  of  the  masses  missed  in  DWCE  seg¬ 
mentation.  Masses  I  and  2  were  missed  because  of  a  dense 
pectoral  muscle  visible  on  the  mammogram  which  over¬ 
whelmed  all  lower-density  structures  (i.e.,  both  mammo¬ 
grams  had  BIRADS  category  I  breast  density).  The  dense 
pectoral  muscle  caused  the  lower  level  of  the  DWCE  inten¬ 
sity  range  to  be  set  so  high  that  lower  intensity  structures 
were  missed.  Figure  7(a)  shows  the  mammogram  of  the 
missed  malignant  mass  (mass  I  from  Table  II).  The  pectoral 
muscle  is  much  denser  than  the  mass.  This  led  to  the  miss. 
One  possible  method  for  eliminating  this  type  of  miss  may 
be  to  identify  the  pectoral  muscle  in  the  mammogram  and  it) 
apply  DWCE  segmentation  to  only  the  remaining  breast  re¬ 
gion.  Mass  3  in  Table  D  was  missed  because  of  the  small 
contrast  difference  between  the  mass  and  the  background 
tissue  even  though  the  mass  was  not  particularly  small  or 
subtle.  The  mammogram  containing  this  ma.ss  is  depicted  in 
Fig.  7(b).  The  remaining  masses  were  missed  in  mammo¬ 
grams  containing  denser  breast  tissue.  It  was  observed  that 
DWCE  segmentation  had  problems  delecting  masses  that 
were  located  near  much  denser  normal  structures.  The  dcn.se 


structures  were  detected  but  the  masses  were  missed.  Figure 
7(c)  shows  an  example  of  this  type  of  miss.  It  shows  the 
mammogram  containing  mass  4  from  Table  11.  Again  the 
dense  pectoral  muscle  may  have  also  hindered  detection  of 
the  mass  in  this  case.  Other  than  these  problems,  the  DWCE 
segmentation  performed  reasonable  well  as  a  first  stage  in 
mass  segmentation.  It  could  identify  the  majority  of  the 
masses  while  eliminating  many  of  the  lower  contrast  back¬ 
ground  structures.  However,  the  DWCE  segmentation  usu¬ 
ally  underestimated  the  actual  borders  of  most  structures.  It 
also  had  a  tendency  to  merge  the  mass  with  neighboring 
structures  that  may  have  had  some  tissue  overlap  with  the 
breast  mass.  A  total  of  48  masses  had  significant  merging 
between  the  mass  and  adjacent  tissues  after  DWCE  segmen¬ 
tation.  This  limited  the  effectiveness  of  the  morphological  FP 
reduction  step  and  limited  the  locahzation  of  the  mass  during 
texture-based  classification. 

The  region-growing  stage  reduced  the  effects  of  object 
merging  and  significantly  increased  the  size  of  the  initial 
DWCE  objects.  This  is  clearly  shown  in  Table  I  where  the 
average  size  of  a  structure  increases  from  33.6  mm"  with 
DWCE  alone  to  52.4  mm^  following  region  growing.  Like¬ 
wise,  a  comparison  of  objects  from  Figs.  4(b)  and  4(e)  shows 
the  improvement  in  border  definition  following  region  grow¬ 
ing.  A  combination  of  gray-scale  and  gradient-based  region 
growing  was  used  because  of  the  difficulty  in  stopping  gray¬ 
scale  region  growing  at  the  correct  edge  and  the  need  for 
large  seed  objects  in  gradient-based  region  growing.  The 
combination  approach  performed  adequately  in  our  detection 
task  and  led  to  an  improvement  in  both  morphological  and 
texture-based  FP  reduction.  However,  some  problems  were 
observed.  One  problem  was  that  small  and  low-contrast 
structures  had  a  tendency  to  grow  into  the  background  and 
become  large  regions  even  though  the  actual  structures  were 
quite  small.  This  did  not  occur  with  masses,  but  it  did  occur 
with  other  breast  structures.  Another  problem  was  that  struc¬ 
tures  containing  internal  gradients  did  not  always  grow  to  the 
correct  border,  but  ended  up  containing  only  a  section  of  the 
inic  object.  This  occurred  to  some  mass  objects  and  led  to 
either  inaccurate  structural  information  or  a  mass  being  split 
into  multiple  pieces.  Figure  8  shows  an  example  of  both 
incomplete  growing  and  a  mass  split  into  pieces  during  re¬ 
gion  growing.  While  these  problems  reduced  the  effective¬ 
ness  of  the  morphological  FP  reduction,  we  have  found  that 
the  overall  benefit  of  region  growing  outweights  its  draw¬ 
backs  and  leads  to  an  improvement  in  detection  accuracy 
with  our  segmentation  scheme. 

The  final  step  in  the  segmentation  was  FP  reduction.  Mor¬ 
phological  feature  classification  was  performed  first  in  our 
reduction  scheme.  The  morphological  classification  reduced 
the  number  of  FPs  per  image  from  45.3  to  35,5  as  shown  in 
Table  I.  Following  morphological  reduction,  the  average  size 
of  the  objects  was  similar  to  the  average  size  before  reduc¬ 
tion,  but  the  standard  deviation  in  object  size  fell  from  85.1 
mm"  before  reduction  to  52.1  mm^  after  reduction.  This  in¬ 
dicates  that  morphological  reduction  eliminated  objects  that 
were  either  much  larger  or  much  smaller  than  the  average 
object  size,  but  had  trouble  differentiating  between  TPs  and 
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FPs  of  similar  sizes.  Therefore,  a  classifier  that  can  better 
differenuate  between  these  similar  shaped  objects  was  still 
TOs  wa.  thieved,  .o  a  large  e«e„.,  «.*  reame- 
based  feature  classification. 

A  LDA,  classifier  based  on  SOLD  texture  eatures  e 
ttacted  from  ROIs  defined  by  each  dete^ed  object  h^ 
proven  to  be  effective  in  differentiaung  between  similar 
Taped  objects.  The  training  and  test  FROC  ^rformance 
curT  following  final  texture  classification  are  shown  in  F^ 

9.  In  addition,  the  number  of  FPs  per  image  for  different  TP 
fractions  are  given  in  Tables  HI  and  FV  for  the  two  curves. 
As  discussed  in  the  Methods  section,  the  mammograms  were 
divided  into  ten  independent  groups  and  a^l 
ratio  was  employed  in  the  classification.  Therefore,  the  test 
value  for  an  object  was  its  single  tesung  score,  and  its  train¬ 
ing  value  was  the  average  of  the  scores  obtained  for  the 
object  dunng  training  with  the  nine  different  trmning  ^oup 
combinations.  TTie  first  point  to  note  in  Tables  " 

that  the  initial  TP  detection  fracuon  has  increa^d  frorn  9 
in  Table  1  to  98<7r  (i.e..  247  total  masses  were  detected).  This 
is  due  to  the  change  in  the  definition  of  a  TP  with  the  texture 
ROIs  The  additional  mass  was  detected  because  in  one  o 
the  seven  mammograms  where  no  object  ^ 

centroid,  an  object  ROl  overlapped  with  at  least  50^  of  the 
mass  The  texture  classification  was  able  to  reduce  the  num¬ 
ber  of  FPs  per  image  from  an  initial  value  of  35.5  to  approxi- 
matelv  19  without  the  loss  of  any  TPs.  achieving  a  4591 
reduction.  While  the  number  of  FPs  is  still  large,  it  indicates 
that  the  more  computationally  intensive  texture  classihcation 
performs  better  than  morphological  reduction.  Additional  re¬ 
duction  in  FPs  can  be  achieved  with  lower  TP 
thresholds.  For  example,  at  a  90%  TP  fraefion  the  ^s^- 
creased  to  4.2  per  image  and  at  an  80%  TP  level  the  FPs 
decreased  to  2.0  per  image.  Comparing  with  our  previously 


reported  twevstage  DWCE  edge  detection  segmentation 
technique*®  (discussed  in  Sec.  1),  we  obtained  improved  per¬ 
formance  at  all  TP  levels  despite  the  fact 
increased  from  168  to  253  mammograms  " 

reduction  stages  were  used  with  the  new  segmentauon  tech 

m  results  presented  in  this  paper  do  not  reflet  resj^s 
from  a  completely  independent  test  set  because  the  feature 
selection  and  the  selection  of  “°n>hologic^^^^fi^aUon 
thresholds  were  based  on  the  entire  image  set.  This  was  n^ 
essary  to  obtain  the  best  possible  mass  smustics  om 

li^Td  data  set  at  the  intermediate  stages  of  the  algontlm.  A 
database  is  currently  being  collected  so  that 
dependent  testing  can  be  performed  using  e  propo 

method. 


V.  CONCLUSION 

We  have  reported  on  an  improved  ^^et 

mass  detection  scheme.  The  scheme  employs  DWCE  seg 
mentation  and  object-based  region  Rowing.  Its  PJ., 

formance  has  achieved  a  90%  TP  detecuon  ^ve  wi*^. 
FPs  per  image  and  an  80%  TP  detecuon  level  with  2.0  Ws 
per  image  with  a  diverse  database  of  253  mammogr^s.  Th 
addition  of  region  growing  improved  the  borders  of  e  - 
tected  objects  and  reduced  merging  between  adjacen  or 
overlapping  structures.  This  improved  the  morphological  in- 
formation  extracted  from  the  detected  breast  m^ses  and  ^ 
the  differentiation  between  masses  and  normal  ussues.  i  n 
FP  reducuon  was  also  simplified  to  a  single  stage  of  in^ho- 
logical  feature  classification  and  a  single  stage  of  SGLD  te^ 
turc  feature  classification.  It  is  expected  that  a  simplified  FP 
reduction  scheme  has  the  potential  to  generalize  better  Aan  a 
more  complicated  scheme  when  CAD  is  implemented  in  a 
clinical  setting.  This  breast  mass  segmentauon  scheme  pro¬ 
vided  improved  FRCX:  performance  compared  to  our  previ¬ 
ously  reported  two-stage  DWCE  technique.  Further  investi¬ 
gations  are  under  way  to  improve  the  region-growing 
segmentation  by  analyzing  different  growing  methods  that 
mav  improve  the  border  definition  of  the  detected  structur^. 
as  well  as  to  develop  new  object  features  that  may  further 
differentiate  masses  from  normal  structures.  Preclinical  test¬ 
ing  of  this  algorithm  on  a  large  set  of  independent  mammo- 
grams  will  also  be  conducted. 


Table  IV  Summary  of  the  lest  FROC  result  depicted  in  F.g  V  ^  table 
conta,n.s  the  number  of  FPs  per  .mage  for  different  TP  fra«.ons  along  with 
the  percentage  of  FPs  reduced  at  each  TP  level  relative  to  the  .n.t.al  v;duc  o 
19.2  FPs  per  image  The  first  entry  in  the  table  is  the  reduction  achiev 
without  missing  any  additional  breast  masses.  .  . 


TP  fraction 


FPs/image 


FP  reduction 


98^ 

95^ 

90<^ 

80*5^ 


19.2 

6.7 

4.2 

2.0 


78<* 

90^ 


acknowledgments 

This  work  is  supported  by  the  Whitaker  Foundation  (NP), 
USPHS  Grant  No.  CA  48129.  a  Career  Development  Award 
DAMD  17-96-1-6012  (BS),  and  research  grant  DAMD  17- 
96-1-6254  from  the  U.S.  Army  Medical  Research  and  Mate¬ 
riel  Command.  The  content  of  this  publication  does  not  nec¬ 
essarily  reflect  the  position  of  the  government,  and  ^ 
official  endorsement  of  any  equipment  or  product  should  be 
inferred. 


Medical  Physics.  Vol.  26.  No.  8,  August  1999 


'  1652 


Petrick  et  aL:  Combined  adaptive  enhancement  and  region-growing  segmentation  of  breast  masses 


1652 


APPENDIX  A:  MORPHOLOGICAL  FEATURE 
DEFINITIONS 


A  set  of  1 1  features  is  used  in  morphologicaJ  FP  reduc¬ 
tion.  Ten  of  these  features  are  based  solely  on  the  binary 
object  defined  by  the  segmentation.  The  other  feature  utilizes 
the  original  gray  scale  values  inside  and  surrounding  the  seg¬ 
mented  object.  An  individual  object  segmented  from  image 
F{x,y)  is  defined  as: 


1,  is  a  pixel  in  object 

0,  otherwise. 


(Al) 


Area, 


Rect,=- 
NRL  mean: 


N-\ 


/^NRL-“7r  2  ^1,7  • 

'  7=0 

NRL  standard  deviation: 

(r.j- 


Mnrl.) 


(A9) 


(AlO) 


(All) 


In  addition,  defines  the  pixels  contained  in  the 

smallest  bounding  box  completely  containing  object  /  and 
^Eqv  defines  the  pixels  of  the  circle  with  the  same  area 
as  and  centered  at  its  centroid  location.  The  radius  of 
^Eqv/'^^y)  is  given  by 


Varea  (i 


^obj,) 


(A2) 


Five  features  are  based  on  the  normalized  radial  length 
(NRL),  defined  as  the  Euclidean  distance  from  an  object's 
centroid  to  each  of  its  edge  pixels  and  normalized  relative  to 
the  maximum  radial  length  for  the  object.**  This  results  in  a 
NRL  vector  for  each  object  i  given  as 

=  <A3) 

where  is  the  number  of  edge  pixels  in  the  object  and 
r,  1.  The  histogram  of  the  normalized  radial  length  is  als(^ 
calculated  and  is  given  by 

P,  =  {prob,  ^  1},  (A4) 

where  is  the  number  of  bins  used  in  the  histogram.  Using 
these  basic  definitions,  the  morphological  features  are  de¬ 
fined  as  follows.  Perimeter: 


NRL  entropy: 


£^nrl,=  -  2  prob,,ylog2(prob,.;). 

*  7  =  0  ^ 


(A12) 


NRL  area  ratio: 


AreaR—  2  (''i,;  Mnrl,)-^7,7^Mnrl, 

a^Mnrl.  7=0  ' 

NRL  zero-crossing  count: 

N,-\ 

ZCC,=  2  Zij, 

7  =  0 


(A13) 


(A14) 


where 

^  7.7-  1  >  MnRL.)  (  '‘/,7-H  ^  MnRL^ 

L  ('■7.7-1*^MNRL.)*^(^,7-H>MnRlL 

,  0,  otherwise. 

Contrast: 

8  in 

Cont,  = - , 

8  OUlj 


(A  15) 


Perim,=  ^  /7,(.r,v), 

Vi.V\ 

where 


where  is  the  average  gray  value  inside  object  i  and 

IS  the  average  gray  value  of  the  one-pixel  wide  background 
surrounding  the  object. 


I  1,  F^^ix.y)  is  an  edge  pixel  of  object  /, 

p,(x,v)  = 

( 0,  otherwise. 

Area: 


Area,=  2  /'ob,<-r-y)- 

V  r .  V  \  ‘ 

Pen  meter- to-area  ratio: 
Perim, 


PAR, 


'  Area, 

Circularity: 

^-Vi.VyFobi  Fgqy 


Circ,  =  - 
Rectangularity: 


'v*  obj^  •  '  EqVj 

Area, 


iAb) 


(A7) 


(AS) 


APPENDIX  B:  SGLD  TEXTURE  FEATURE 
DEFINITIONS 

Global  and  local  multiresolution  texture  features  are 
based  on^  the  spatial  gray  level  dependence  (SGLD) 
matrix. An  element  of  the  SGLD  matrix,  Pd  ^iiJ),  is 
dehned  as  the  joint  probability  that  gray  levels  i  and  j  occur 
at  a  given  interpixel  separation  d  and  direction  6.  In  this 
study,  n  is  defined  as  the  number  of  gray  levels  in  an  image. 
A  total  of  13  different  texture  measures  were  defined  for 
each  SGLD  matrix.  They  were  defined  as  follows. 

Energy: 

n -  I  n-\ 

(Bl) 

Correlation: 
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2?ro'  2  "Id  ( f  -  Mr )(./  -  Mv  )Pd.  e( '  .y ) 
tr^o-v 

(B2) 

Difference  variance: 

n  —  1 

o-,^v=X  (/-Mr-y)Vx-.v(0. 

(B16) 

where 

/=0 

u  —  1  n  —  \ 

1  =  0  7  =  0 

n  1  n  ^  1 

(B3) 

Difference  entropy: 

n-  1 

W,-v=  -  X  P,-y(01og2(Px-y(^))’ 

(B17) 

•  1  =  0  ;  =  0 

<r.=  >/2:='o'  s;=<i  ( '■  -  «.-/)> 

and 

<7,=  ^2 "Jo'  u  -  P^y^PdA  '■  )  ■ 

(B4) 

(B5) 

(B6) 

/=0 

Information  measure  of  correlation  1: 

H-Hi 

°^^'"max{W,, //>.}■ 

Information  measure  of  correlation  2: 

(B18) 

Entropy: 

IMC2= 

(B19) 

n-  1  rt-  1 

//=-y  2 
(  =  0  ;  =  0 

Inertia: 

(B7) 

where 

n  -  1 

//,=  -2  P,(')l0g2(Px(')). 

1=0 

(B20) 

n-  I  n-  1 

In=2  S  (i-j)‘PdJ‘-J)- 

r  =  0  7  =  0 

(B8) 

n-1 

Hv=  -  X  Pv(;)lOg2(Pv0))’ 

r  =  0 

(B21) 

Inverse  difference  moment: 

7  ^ 

n  —  1  n  —  1 

fi  —  1  n  —  1  1 

IDM-2  X 

,  =  0  7  =  0  1  +1'  Jf 

(B9) 

//,  =  -2  X  Pj.»(»j)log2(Mx(')Pv(;)) 

1=0  7  =  0 

(B22) 

Sum  average: 

and 

2n-: 

k  =  0 

(BIO) 

W,=  -X  X  Pxif)PyU)^°Sl(Pxi‘)PyU))- 
,  =  0  7  =  0 

(B23) 

where 

fi  -  1  n  -  I 

p, +  .<<■')=  X  X  Pj.hUJ^- 

1*0  /-() 

i'hj-k  and  ^  =  0 . 2n-2. 

Sum  variance: 

2n-2 

t*o 

Sum  enU'opy: 


(Bin 


(B12) 


H,  +  ,=  -  2  p,*,ik)\Og2(Px*y(l^))- 


(B13) 


(B14) 


**0 

Difference  average: 

n  -  I 

/Xt-v=X  /p,-v(0. 

/  =  0 

where 

P,-v(/)=X  X  l'-^■l=/  “"<1 /=o . 

(B15) 
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Abstract.  A  genetic  algorithm  (GA)  based  feature  selection  method  was  developed  for  the 
design  of  high-sensitivity  classifiers,  which  were  tailored  to  yield  high  sensitivity  with  high 
specificity.  The  fitness  function  of  the  GA  was  based  on  the  receiver  operating  characteristic 
(ROC)  partial  area  index,  which  is  defined  as  the  average  specificity  above  a  given  sensitivity 
threshold.  The  designed  GA  evolved  towards  the  selection  of  feature  combinations  which  yielded 
high  specificity  in  the  high-sensitivity  region  of  the  ROC  curve,  regardless  of  the  performance  at 
low  sensitivity.  This  is  a  desirable  quality  of  a  classifier  used  for  breast  lesion  characterization, 
since  the  focus  in  breast  lesion  characterization  is  to  diagnose  correctly  as  many  benign  lesions 
as  possible  without  missing  malignancies.  The  high-sensitivity  classifier,  formulated  as  the 
Fisher’s  linear  discriminant  using  GA-selected  feature  variables,  was  employed  to  classify 
255  biopsy-proven  mammographic  masses  as  malignant  or  benign.  The  mammograms  were 
digitized  at  a  pixel  size  of  0.1  mm  x  0.1  mm,  and  regions  of  interest  (ROIs)  containing  the 
biopsied  masses  were  extracted  by  an  experienced  radiologist.  A  recently  developed  image 
transformation  technique,  referred  to  as  the  rubber-band  straightening  transform,  was  applied 
to  the  ROIs.  Texture  features  extracted  from  the  spatial  grey-level  dependence  and  run-length 
statistics  matrices  of  the  transformed  ROIs  were  used  to  distinguish  malignant  and  benign  masses. 
The  classification  accuracy  of  the  high-sensitivity  classifier  was  compared  with  that  of  linear 
discriminant  analysis  with  stepwise  feature  selection  (LDAsfs).  With  proper  GA  training,  the 
ROC  partial  area  of  the  high-sensitivity  classifier  above  a  true-positive  fraction  of  0.95  was 
significantly  larger  than  that  of  LDAsfs,  although  the  latter  provided  a  higher  total  area  (A^) 
under  the  ROC  curve.  By  setting  an  appropriate  decision  threshold,  the  high-sensitivity  classifier 
and  LDAsfs  correctly  identified  61%  and  34%  of  the  benign  masses  respectively  without  missing 
any  malignant  masses.  Our  results  show  that  the  choice  of  the  feature  selection  technique  is 
important  in  computer-aided  diagnosis,  and  that  the  GA  may  be  a  useful  tool  for  designing 
classifiers  for  lesion  characterization. 


1.  Introduction 

Due  to  its  high  sensitivity,  mammography  is  usually  the  first  radiological  examination 
used  for  the  early  detection  of  malignant  breast  lesions.  However,  the  positive  predictive 
value  (PPV)  of  mammographic  diagnosis  (ratio  of  the  number  of  malignancies  to  the  total 
number  of  biopsy  recommendations)  is  not  high.  Biopsies  performed  for  mammographically 
suspicious  non-palpable  breast  masses  had  PPVs  of  20  to  30%  in  three  studies  (Hermann 
et  al  1987,  Hall  et  al  1988,  Jacobson  and  Edeiken  1990).  To  reduce  health-care  costs 
and  patient  morbidity,  it  is  desirable  to  increase  the  PPV  of  mammographic  diagnosis 
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while  maintaining  its  sensitivity  of  cancer  detection.  Computerized  mammographic  analysis 
methods  can  potentially  aid  radiologists  in  achieving  this  goal. 

In  recent  years,  several  researchers  have  developed  new  techniques  for  the  classification 
of  mammographic  masses  based  on  computer-extracted  features  (Brzakovic  et  al  1990, 
Kilday  et  al  1993,  Huo  et  al  1995,  Pohlman  et  al  1996,  Rangayyan  et  al  1996,  Sahiner 
et  al  1996a,  1997,  1998).  Kilday  et  al  (1993)  classified  masses  using  morphological 
features  and  patient  age.  Brzakovic  et  al  (1990)  classified  suspected  lesions  using  their 
shape  and  intensity  variations.  Huo  et  al  (1995)  developed  a  technique  to  quantify  the 
degree  of  spiculation  of  a  lesion,  and  classified  masses  as  malignant  and  benign  using 
these  spiculation  measures.  Pohlman  et  al  (1996)  developed  a  region  growing  algorithm 
for  tumour  segmentation,  and  used  features  describing  the  tumour  shape  for  classification. 
Rangayyan  et  al  (1996)  used  an  edge  acutance  measure  extracted  from  the  grey-scale 
intensity  along  the  normal  direction  to  the  mass  shape,  as  well  as  moments  to  classify 
masses.  We  have  developed  the  rubber-band  straightening  transform  (RBST)  for  facilitating 
the  extraction  of  effective  texture  features,  and  used  the  texture  features  extracted  from  the 
transformed  image  for  classification  (Sahiner  et  al  1996a,  1997,  1998). 

A  common  characteristic  of  the  above  approaches  is  that  the  lesion  is  first  segmented 
from  the  surrounding  tissue,  and  then  features  are  extracted  from  the  shape  and  grey-level 
characteristics  of  the  lesion  and  the  surrounding  tissue.  The  extracted  features  usually 
represent  a  mathematical  description  of  characteristics  that  are  helpful  for  distinguishing 
malignant  and  benign  lesions.  When  several  features  are  extracted  for  classification,  it  may 
be  difficult  to  predict  which  features  or  feature  combinations  will  result  in  more  accurate 
classification.  For  example,  it  is  known  that  the  borders  of  malignant  masses  tend  to  be  more 
irregular  than  those  of  benign  masses;  therefore,  it  is  expected  that  the  normalized  radial 
lengths  (Kilday  et  al  1993)  carry  useful  information  about  the  probability  of  malignancy  of 
a  mass.  However,  since  the  normalized  radial  lengths,  and  especially  the  features  extracted 
from  them  (for  example  variance  and  entropy),  do  not  exactly  measure  irregularity  but 
instead  merge  information  from  a  combination  of  border  characteristics,  it  is  difficult  to 
predict  which  feature  combination  will  yield  the  highest  classification  accuracy  when  used  in 
a  statistical  classifier.  It  is  known  that  the  inclusion  of  inappropriate  features  may  adversely 
affect  classifier  performance,  especially  when  the  training  set  is  not  sufficiently  large  (Raudys 
and  Jain  1991,  Sahiner  et  al  1996c).  Therefore,  in  many  situations,  one  must  face  the  task 
of  selecting  a  subset  of  effective  features  for  classification. 

One  systematic  method  for  feature  selection  is  linear  discriminant  analysis  with  stepwise 
feature  selection  (LDAsfs),  which  has  been  applied  to  feature  selection  problems  in  computer- 
aided  diagnosis  (Chan  et  al  1995,  Wei  et  al  1995).  LDAsfs  is  an  iterative  procedure,  where 
one  feature  is  entered  into  or  removed  from  the  selected  feature  pool  at  each  step  by 
analysing  its  effect  on  a  selection  criterion.  The  nature  of  the  stepwise  selection  procedure 
makes  it  imperative  that  the  selection  criterion  be  a  statistical  distance  measure  between  the 
two  groups  to  be  classified.  The  Wilks  lambda  and  the  Mahalanobis  distance  are  commonly 
used  measures.  Genetic  algorithm  (GA)  based  feature  selection,  which  is  capable  of  using 
any  numerically  computed  criterion  for  its  fitness  function,  is  a  slower  but  more  versatile 
method  than  stepwise  feature  selection.  We  have  demonstrated  that  when  the  GA  fitness 
criterion  is  related  to  the  area  under  the  receiver  operating  characteristic  (ROC)  curve, 
GA-based  feature  selection  yields  slightly  more  effective  features  than  LDAjfs  (Sahiner  et  al 
1996c). 

In  the  task  of  lesion  characterization,  the  cost  of  missing  a  malignancy  is  very  high. 
Therefore,  the  performance  of  a  classifier  in  the  high-sensitivity  (high  true-positive  fraction) 
region  of  the  ROC  curve  is  more  important  than  the  overall  area  under  the  ROC  curve.  In 
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other  words,  if  a  classifier  is  to  be  designed  for  breast  lesion  characterization,  the  specificity 
at  high  levels  of  sensitivity  is  much  more  important  than  the  specificity  at  low  levels  of 
sensitivity.  Recently,  Jiang  et  al  (1996)  developed  a  method  for  describing  an  ROC  partial 
area  index  that  may  be  useful  as  a  performance  measure  in  lesion  characterization  problems. 
Since  a  feature  (or  feature  combination)  that  can  provide  a  large  overall  (or  a  large  Wilks 
lambda  and  Mahalanobis  distance)  may  not  provide  a  large  partial  ROC  area,  it  is  important 
to  develop  a  feature  selection  method  for  the  design  of  high-sensitivity  classifiers.  The 
partial  ROC  area  is  potentially  a  good  feature  selection  criterion  for  this  application.  The 
flexibility  of  a  GA  in  the  selection  of  its  fitness  function  allows  this  index  to  be  incorporated 
for  feature  selection. 

In  this  study,  we  developed  a  methodology  to  design  high-sensitivity  classifiers.  The 
design  process  was  illustrated  by  the  task  of  classifying  masses  on  digitized  mammograms 
as  malignant  or  benign.  A  GA-based  algorithm  with  the  ROC  partial  area  index  as  the 
feature  selection  criterion,  in  combination  with  Fisher’s  linear  discriminant,  was  used  for 
the  design  of  this  classifier.  Texture  features  extracted  from  REST  images  (Sahiner  et  al 
1998)  were  used  for  classification.  The  performance  of  the  high-sensitivity  classifier  was 
compared  with  the  performance  achieved  by  LDAsfs  using  the  Wilks  lambda  as  the  feature 
selection  criterion. 

2.  Materials  and  methods 

2.7.  Data  set 

The  mammograms  used  in  this  study  were  selected  from  the  files  of  patients  at  the  Radiology 
Department  of  the  University  of  Michigan  who  had  undergone  biopsy.  The  mammograms 
were  acquired  with  dedicated  mammographic  systems  with  0.3  mm  focal  spots,  molybdenum 
anodes,  0.03  mm  thick  molybdenum  filters  and  5:1  reciprocating  grids.  For  recording  the 
images,  a  Kodak  MinR/MRE  screen/film  system  with  extended  cycle  processing  was  used. 
The  criterion  for  inclusion  of  a  mammogram  in  the  data  set  was  that  the  mammogram 
contained  a  biopsy-proven  mass,  and  that  approximately  equal  numbers  of  malignant  and 
benign  masses  were  present  in  the  data  set. 

Our  data  set  consisted  of  255  mammograms  from  104  patients.  For  most  of  the  patients 
we  had  two  mammograms  in  the  data  set,  which  were  the  craniocaudal  and  the  mediolateral 
oblique  views.  However,  for  some  of  the  patients,  extra  views  such  as  lateral  and  oblique 
views  were  included  in  the  data  set.  There  were  128  mammograms  with  benign  masses, 
of  which  8  were  spiculated  based  upon  radiologist  interpretation,  and  127  mammograms 
with  malignant  masses,  of  which  62  were  spiculated.  Of  the  104  patients  evaluated  in 
this  study,  48  had  malignant  masses.  The  probability  of  malignancy  of  the  biopsied  mass 
on  each  mammogram  was  ranked  by  a  Mammography  Quality  Standards  Act  (MQSA) 
approved  radiologist  experienced  in  mammographic  interpretation  on  a  scale  of  1  to  10.  A 
ranking  of  1  corresponded  to  the  masses  with  the  most  benign  mammographic  appearance, 
and  a  ranking  of  10  corresponded  to  the  masses  with  the  most  malignant  mammographic 
appearance.  The  distribution  of  the  malignancy  ranking  of  the  masses  is  shown  in  figure  1. 
The  true  pathology  of  the  masses  was  determined  by  biopsy  and  histological  analysis. 

The  mammograms  in  the  data  set  were  digitized  with  a  Lumisys  DIS-1000  laser  scanner 
at  a  pixel  resolution  of  0.1  mm  x  0.1  mm  and  4096  grey  levels.  The  digitizer  was  calibrated 
so  that  grey-level  values  were  linearly  proportional  to  the  optical  density  (OD)  within  the 
range  of  0.1  to  2.8  OD  units,  with  a  slope  of  0.001  OD/pixel  value.  Outside  this  range, 
the  slope  of  the  calibration  curve  decreased  gradually,  with  the  OD  range  extending  to  3.5. 
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Figure  1.  The  distribution  of  the  malignancy  ranking  of  the  masses  in  our  data  set,  as  determined 
by  a  radiologist  experienced  in  mammographic  interpretation:  1,  very  likely  benign;  10,  very 
likely  malignant. 


The  pixel  values  were  linearly,  converted  before  they  were  stored  on  the  computer  so  that  a 
high  pixel  value  represented  a  low  optical  density. 

The  location  of  the  biopsied  mass  was  identified  by  the  radiologist,  and  a  region  of 
interest  (ROI)  containing  the  biopsied  mass  was  extracted  for  computerized  analysis.  The 
size  of  the  ROI  was  allowed  to  vary  according  to  the  lesion  size.  The  extracted  ROIs 
contained  a  non-uniform  background,  which  depended  on  the  overlapping  breast  structures 
and  the  location  of  the  lesion  on  the  mammogram.  The  non-uniform  background  is  not 
related  to  mass  malignancy,  but  may  affect  the  segmentation  and  feature  extraction  results 
used  in  our  computerized  analysis.  To  reduce  the  background  non-uniformity,  an  automated 
background  correction  technique  was  applied  to  each  ROI  as  the  very  first  step  in  our 
analysis.  Details  and  examples  of  our  background  correction  technique  can  be  found  in  the 
literature  (Sahiner  et  al  1996b). 

2.2.  The  rubber-band  straightening  transform  (REST) 

In  this  study,  the  classification  of  malignant  and  benign  masses  was  based  on  the  textural 
differences  of  their  mammographic  appearance.  We  have  previously  designed  a  rubber- 
band  straightening  transform  (REST)  which  was  found  to  facilitate  the  extraction  of  texture 
features  from  the  region  surrounding  a  mammographic  mass.  The  image  transformation  per¬ 
formed  by  the  REST  is  depicted  in  figure  2,  and  a  block  diagram  of  different  stages  of  the 
REST  is  given  in  figure  3.  A  detailed  discussion  of  the  transform  can  be  found  in  the  litera¬ 
ture  (Sahiner  et  al  1996a,  1997,  1998).  For  completeness,  a  brief  description  is  given  below. 

The  REST  transforms  a  band  of  pixels  surrounding  a  mass  onto  the  Cartesian  plane. 
The  four  basic  steps  in  the  REST  are  mass  segmentation,  edge  enumeration,  computation 
of  normals  and  interpolation.  A  modified  /if -means  clustering  algorithm  (Sahiner  et  al 
1995)  was  used  for  segmentation.  The  parameters  of  the  segmentation  algorithm  were 
chosen  so  that  the  segmented  region  was  slightly  smaller  than  the  actual  size  of  the  mass. 
After  clustering,  one  to  several  objects  would  be  segmented  in  the  ROI.  If  more  than 
one  object  was  segmented,  the  largest  connected  object  was  selected.  The  selected  object 
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Figure  2.  The  formation  of  the  RBST  image. 
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Figure  3.  Block  diagram  of  the  stages  of  RBST  image  computation. 


was  then  filled,  grown  in  a  local  neighbourhood,  and  eroded  and  dilated  with  morphological 
operators.  The  implementation  details  of  these  steps  have  been  described  elsewhere  (Sahiner 
et  al  1998).  After  the  outline  of  the  mass  was  obtained,  an  edge  enumeration  algorithm 
assigned  a  pixel  number  to  each  border  pixel  of  the  mass,  such  that  neighbouring  pixels 
were  assigned  consecutive  numbers.  The  computation  of  normals  depended  on  the  output 
of  the  edge  enumeration  algorithm.  The  normal  L(i)  at  border  pixel  i  was  determined 
as  the  normal  to  the  line  joining  border  pixels  i  —  K  and  i  K.  The  choice  of  the 
constant  K  represents  a  trade-off  between  a  noisy  estimate  of  the  normal  direction  (small 
K)  and  an  estimate  that  misses  fine  variations  in  the  normal  direction  (large  K).  In  order 
to  determine  the  constant  K  to  be  used  in  this  study,  we  selected  a  small  subset  of  images 
from  our  database,  and  plotted  the  normal  direction  obtained  by  using  different  values  of  K 
superimposed  on  the  segmented  image.  By  performing  a  visual  comparison  of  the  computed 
normal  direction  to  what  was  perceived  to  be  the  true  normal  direction,  it  was  empirically 
found  that  ^  =  12  resulted  in  a  satisfactory  normal  estimation.  In  the  interpolation  step, 
the  value  of  the  pixel  in  row  j,  column  i  of  the  RBST  image  was  found  as  follows.  Let 
p(i,  j)  denote  the  location  in  the  original  image  at  a  distance  j  along  L(i)  from  border 
pixel  i.  The  two  closest  pixels  in  the  original  ROI  to  location  p(i,  j)  were  identified,  and 
the  (/,  j)ih  pixel  value  of  the  RBST  image  was  defined  as  the  distance- weighted  average  of 
these  two  pixel  values. 

The  width  of  the  band  transformed  by  the  RBST  was  chosen  as  40  pixels  in  this  study, 
which  corresponded  to  4  mm  on  the  mammogram.  An  example  of  the  background-corrected 
ROI,  the  segmented  and  morphologically  filtered  mass  shape,  and  the  RBST  image  are 
shown  in  figure  4. 


Figure  4.  (a)  The  original  mammographic  ROI.  (b)  The  segmented  and  morphologically  filtered 
mass  shape  (white),  and  the  40-pixel-wide  band  around  it  (grey).  For  the  purpose  of  illustration, 
the  normals  computed  at  i  =  0,  20  and  50  are  also  shown,  (c)  The  REST  image.  Notice  that  due 
to  the  position  of  the  first  normal  location  (/  =  0),  the  calcifications  cl  and  c2  on  the  original 
ROI  appear  at  the  right  and  the  left  of  the  REST  image  respectively.  The  pathological  analysis 
indicated  that  this  was  an  invasive  ductal  and  intraductal  carcinoma. 


2.3.  Texture  features 

The  texture  features  used  for  the  classification  of  the  malignant  and  benign  masses  were 
spatial  grey-level  dependence  (SOLD)  and  run  length  statistics  (RLS)  features.  These 
features  were  extracted  from  SOLD  and  RLS  matrices,  which  were  constructed  from  the 
REST  images  as  described  below. 

2.3.1.  SGLD  features.  The  (/,  j)ih  element  of  the  SOLD  matrix  pejQy  j)  represents  the 
probability  that  grey  levels  i  and  j  occur  at  an  angle  0  and  a  distance  d  with  respect  to  each 
other.  The  use  of  SGLD  matrices  for  feature  extraction  was  motivated  by  the  assumption 
that  texture  information  is  contained  in  the  average  spatial  relationships  between  the  grey- 
level  tones  in  the  image  (Haralick  et  al  1973).  The  features  extracted  from  SGLD  matrices 
of  mammographic  ROIs  have  been  shown  to  be  useful  in  classification  of  mass  and  normal 
tissue,  and  malignant  and  benign  masses  or  microcalcifications  in  computer-aided  diagnosis 
(CAD)  (Chan  et  al  1995,  1997a,  Wei  et  al  1995,  Sahiner  et  al  1996b,  1998). 

In  this  study,  four  different  directions  (0  =  0°,  45°,  90°  and  135°)  and  ten  different 
pixel  pair  distances  (^/  =  1,  2,  3,  4,  6,  8,  10,  12,  16  and  20)  were  used  for  the  construction 
of  SGLD  matrices  from  REST  images.  The  total  number  of  SGLD  matrices  was  therefore 
40.  Based  on  our  previous  studies  (Chan  et  al  1995),  a  bit  depth  of  eight  bits  was  used  in 
the  SGLD  matrix  construction. 

A  number  of  SGLD  features,  which  describe  the  shape  of  the  SGLD  matrices,  can  be 
extracted  from  each  SGLD  matrix.  In  this  study,  we  extracted  eight  such  features,  which 
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were  also  used  in  our  previous  studies  (Chan  et  al  1995,  Wei  et  al  1995,  Sahiner  et  al 
1998).  These  texture  features  were  correlation,  difference  entropy,  energy,  entropy,  inertia, 
inverse  difference  moment,  sum  average  and  sum  entropy.  This  resulted  in  the  computation 
of  320  SOLD  features  per  REST  image.  These  features  characterize  information  such  as 
homogeneity,  contrast  and  structural  linearity  in  the  images.  However,  it  is  difficult  to 
establish  a  one-to-one  correspondence  between  these  qualitative  image  characteristics  and 
the  extracted  texture  features  (Haralick  et  al  1973).  The  definitions  of  the  SGLD  features 
used  in  this  study  can  be  found  in  the  literature  (Haralick  et  al  1973,  Chan  et  al  1995,  Wei 
et  al  1995). 

2,3,2.  RLS  features.  The  pixels  along  a  given  line  in  an  image  occasionally  contain  runs 
of  consecutive  pixels  that  all  have  the  same  grey  level.  A  grey- level  run  is  defined  as  a  set 
of  consecutive,  collinear  pixels  in  a  given  direction  which  have  the  same  grey-level  value. 
A  run  length  is  the  number  of  pixels  in  a  grey-level  run.  The  RLS  matrix  for  a  given  image 
describes  the  run  length  statistics  in  a  given  direction  for  each  grey-level  value  in  the  image. 
The  (/,  7)th  element  of  the  RLS  matrix  r6(i,  j)  represents  the  number  of  times  that  runs 
of  length  j  in  the  direction  0  consisting  of  pixels  with  a  grey  level  i  exist  in  the  image 
(Weszka  et  al  1976), 

The  RLS  matrices  in  this  study  were  extracted  from  the  vertical  and  horizontal  gradient 
magnitudes  of  the  REST  images.  The  vertical  and  horizontal  gradients  were  obtained  by 
filtering  the  REST  images  with  horizontally  and  vertically  oriented  Sobel  filters  (Jain  1989) 
respectively.  Examples  of  the  gradient  magnitude  images  are  shown  in  figure  5.  The  RLS 
matrices  were  obtained  from  each  gradient  magnitude  image  in  two  directions,  0=0°  and 
0  =  90°.  Therefore,  a  total  of  four  RLS  matrices  were  obtained  for  each  REST  image. 


(a) 


(b) 


Figure  5.  Gradient  magnitude  images  for  the  RBST  image  in  figure  4:  (a)  horizontal  gradient 
magnitude  image  and  (b)  vertical  gradient  magnitude  image. 


Eased  on  our  previous  study,  a  bit  depth  of  5  was  used  for  the  computation  of  RLS 
matrices  (Sahiner  et  al  1998).  Five  RLS  features,  namely  short  runs  emphasis,  long  runs 
emphasis,  grey-level  non-uniformity,  run  length  non-uniformity  and  run  percentage  were 
extracted  from  each  RLS  matrix.  This  resulted  in  the  computation  of  20  RLS  features  per 
REST  image.  The  definitions  of  these  features  can  be  found  in  the  literature  (Galloway 
1975).  It  is  possible  to  describe  the  general  aspects  of  the  relationship  between  the  image 
characteristics  and  the  RLS  feature  values.  For  example,  run  percentage  is  low  for  images 
with  long  linear  structures,  and  grey-level  non-uniformity  is  low  for  images  where  runs 
are  equally  distributed  throughout  the  grey  levels  (Galloway  1975),  However,  it  is  again 
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difficult  to  establish  a  one-to-one  correspondence  between  these  texture  features  and  visual 
image  features. 

2A.  Fisher's  linear  discriminant  and  LDAsjs 

For  a  two-class  problem,  Fisher’s  linear  discriminant  projects  the  multidimensional  feature 
space  onto  the  real  line  in  such  a  way  that  the  ratio  of  between-class  sum  of  squares  to 
within-class  sum  of  squares  is  maximized  after  the  projection  (Duda  and  Hart  1973).  This 
is  the  optimal  classifier  if  the  features  for  the  two  classes  have  a  multivariate  Gaussian 
distribution  with  equal  covariance  matrices  (Lachenbruch  1975).  It  has  been  shown  to  be  a 
reasonably  good  classifier  even  when  the  feature  distributions  for  the  two  classes  are  non- 
Gaussian  (Duda  and  Hart  1973).  Linear  discriminant  analysis  (LDA)  is  a  class  of  statistical 
techniques  based  on  Fisher’s  linear  discriminant. 

When  the  training  data  size  is  limited,  the  inclusion  of  inappropriate  features  in  a 
classifier  may  reduce  the  test  accuracy  due  to  overtraining.  Therefore,  when  a  large  number 
of  features  are  available  for  a  classification  task,  it  is  necessary  to  select  a  subset  of  the 
most  effective  features  from  the  feature  pool.  LDAsfs  is  a  commonly  used  feature  selection 
method  (Lachenbruch  1975).  In  this  study,  the  performance  of  a  GA-based  high-sensitivity 
feature  selection  method  was  compared  with  that  of  stepwise  feature  selection. 

Wilks’  lambda,  which  is  defined  as  the  ratio  of  within-group  sum  of  squares  to  the  total 
sum  of  squares  (Lachenbruch  1975),  was  used  as  the  selection  criterion  for  the  stepwise 
feature  selection  method.  The  stepwise  feature  selection  algorithm  starts  with  no  selected 
features  at  step  0.  At  step  s  of  the  algorithm,  the  available  features  are  entered  into  the 
selected  feature  pool  one  at  a  time  during  feature  entry,  and  those  already  selected  are 
removed  one  at  a  time  during  feature  removal.  The  significance  of  the  change  in  the  Wilks’ 
lambda,  as  determined  by  F-statistics,  when  a  new  feature  is  entered  into  the  selected 
feature  pool  is  compared  with  a  threshold  Fin.  The  feature  with  the  highest  significance  is 
entered  to  the  selected  feature  pool  only  if  the  significance  is  higher  than  Fin.  Likewise,  the 
significance  of  the  change  in  the  Wilks’  lambda  when  a  selected  feature  is  removed  from 
the  feature  pool  is  compared  with  a  threshold  Fom-  The  feature  with  the  least  significance 
is  removed  from  the  selected  feature  pool  only  if  the  significance  is  lower  than  Font.  This 
completes  step  s  of  the  algorithm.  The  algorithm  terminates  when  no  more  features  can 
satisfy  the  criteria  for  either  being  added  to  or  removed  from  the  selected  feature  pool. 

2.5.  Genetic  algorithms  for  feature  selection 

Genetic  algorithms  solve  optimization  problems  by  mimicking  the  natural  selection  process. 
A  GA  follows  the  evolution  of  a  population  of  chromosomes  which  are  encoded  so  that 
each  chromosome  corresponds  to  a  possible  solution  of  the  optimization  problem.  The 
chromosomes  consist  of  genes,  which  are  components  of  the  solution.  The  goal  of  a  GA 
is  to  search  for  better  combinations  of  the  genes,  i.e.  new  chromosomes  which  are  better 
solutions  to  the  optimization  problem.  This  goal  is  achieved  by  evolution.  A  new  generation 
of  chromosomes  is  produced  from  the  current  population  by  means  of  parent  selection, 
crossover  and  mutation.  The  probability  that  a  chromosome  is  selected  as  a  parent  is 
related  to  its  ability  to  solve  the  optimization  problem,  i.e.  its  fitness.  Chromosomes  which 
are  better  solutions  to  the  optimization  problem  are  given  a  higher  chance  to  reproduce  than 
those  which  are  worse  solutions  to  the  problem,  similar  to  the  principle  of  natural  selection. 
The  fitness  of  a  chromosome  is  computed  using  a  fitness  function,  which  is  designed  on 
the  basis  of  the  optimization  criterion  for  the  problem.  The  probability  that  a  chromosome 
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is  selected  as  a  parent  is  equal  to  its  normalized  fitness,  which  is  defined  as  the  fitness  of 
the  chromosome  divided  by  the  sum  of  fitnesses  for  all  chromosomes.  The  chromosomes 
of  the  selected  parents  are  allowed  to  randomly  cross  over  and  mutate,  introducing  new 
genes  and  new  chromosomes  into  the  population.  This  process  generates  a  new  population 
of  chromosomes,  which  tends  to  evolve  towards  a  better  solution. 

GAs  had  been  applied  to  the  problem  of  feature  selection  (Brill  et  al  1992,  Sahiner 
et  al  1996c).  The  most  natural  way  of  encoding  a  chromosome  for  this  problem  is  as 
follows  (Sahiner  et  al  1996c).  Each  gene  in  a  chromosome  is  a  bit,  which  takes  a  value  of 
either  1  or  0.  Each  gene  location  in  a  chromosome  corresponds  to  a  particular  feature.  If 
the  bit  value  at  a  gene  location  is  1,  the  corresponding  feature  is  selected  for  the  solution 
of  the  classification  problem.  Otherwise,  the  corresponding  feature  is  not  selected.  Each 
chromosome  thus  defines  a  set  of  selected  features.  A  statistical  classifier,  such  as  Fisher’s 
linear  classifier  or  a  neural  network  classifier,  is  then  employed  for  classification  based  on 
the  selected  feature  set.  The  fitness  function  reflects  the  success  of  the  selected  feature  set  for 
solving  the  classification  problem.  The  design  of  the  fitness  function  for  a  high-sensitivity 
classifier  is  described  in  the  next  section.  The  GA  training  method  and  the  choice  of  GA 
parameters  are  summarized  next. 


2.5.7.  GA  training.  The  GA  in  this  study  was  trained  using  a  leave-one-case-out  paradigm. 
In  this  paradigm,  all  ROIs  except  those  from  a  particular  patient  were  defined  as  the 
training  set,  and  the  ROIs  from  that  particular  patient  were  defined  as  the  test  set.  For 
each  chromosome  of  the  GA,  the  coefficients  of  Fisher’s  linear  discriminant  function  were 
determined  using  the  features  of  the  training  set.  The  trained  discriminant  function  was 
then  used  to  classify  the  test  cases  using  the  features  of  the  test  cases  as  the  input.  In  a 
given  generation  of  the  GA,  all  patients  were  visited  in  a  round-robin  manner,  so  that  test 
scores  were  obtained  for  each  ROI  in  the  entire  data  set.  The  fitness  of  the  chromosome 
was  computed  based  on  the  classification  accuracy  for  the  test  cases,  as  described  in  the 
next  section. 

2.5.2.  GA  parameters.  The  fundamental  parameters  of  a  GA  are  the  number  of  chro¬ 
mosomes,  the  chromosome  length,  the  crossover  rate,  the  mutation  rate  and  the  stopping 
criterion.  In  a  GA,  the  population  must  contain  a  large  number  of  chromosomes  to  pro¬ 
vide  the  variability  that  offers  the  opportunity  to  evolve  towards  the  optimal  solution.  This 
requirement  and  computing  speed  considerations  are  trade-offs  for  selecting  the  number  of 
chromosomes  in  a  given  application.  The  length  of  a  chromosome  is  determined  by  the  en¬ 
coding  mechanism  which  translates  the  optimization  problem  into  a  GA.  With  the  encoding 
mechanism  described  earlier  in  this  subsection,  the  length  of  each  chromosome  is  equal  to  the 
total  number  of  features.  The  fitness  function  is  the  most  important  component  of  the  GA, 
and  its  design  is  described  in  the  next  section.  Pairs  of  chromosomes  are  probabilistically 
selected  as  parents  based  on  their  fitness.  A  selected  pair  may  exchange  genes  to  generate 
two  offspring.  The  crossover  rate  determines  the  probability  that  parents  will  exchange 
genes.  After  crossover,  the  binary  value  of  each  bit  may  probabilistically  be  altered  (from  1 
to  0,  or  vice  versa),  i.e.  mutated.  The  mutation  rate  determines  the  probability  that  genes  will 
undergo  mutation.  The  increase  in  the  fitness  of  the  chromosomes  starts  to  stagnate  after  a 
number  of  generations.  The  stopping  criterion  determines  when  the  evolution  is  terminated. 
In  this  study,  the  GA  evolution  was  terminated  after  a  fixed  number  of  iterations.  The 
appropriateness  of  this  stopping  criterion  is  discussed  in  section  4.  After  the  termination, 
the  chromosome  with  the  highest  fitness  value  provided  the  set  of  selected  features. 
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Table  1  shows  the  values  of  each  of  these  parameters,  selected  based  on  our  previous 
work.  More  detailed  discussion  of  these  operators  and  parameters  can  be  found  in  the 
literature  (Sahiner  et  al  1996c). 


Table  1.  GA  parameters  used  in  this  study. 

Crossover  rate  0.9 

Mutation  rate  0.0025 

Chromosome  length  340 

Number  of  chromosomes  200 
Stopping  criterion  200  iterations 


2.(5.  Design  of  a  high-sensitivity  classifier 


A  widely  accepted  method  for  comparing  the  performance  of  two  classifiers  is  to  consider 
their  ROC  curves.  The  area  under  the  ROC  curve  is  a  commonly  used  index  for  this 
comparison.  However,  for  applications  where  the  performance  at  high  sensitivity  (or  high 
true-positive  fraction)  is  important,  for  example  breast  lesion  characterization  in  CAD,  this 
index  may  be  inadequate.  Jiang  etal  (1996)  explored  this  issue,  and  defined  an  ROC  partial 
area  index  that  will  be  denoted  as  Atpf,,  in  this  paper. 

The  partial  area  index  Ayppo  summarizes  the  average  specificity  above  a  sensitivity  of 
TPFq  (figure  6),  and  can  be  expressed  as  (Jiang  et  al  1996) 

^tpf„  =  1  -  . .  /  FPF(TPF)d(TPF)  (1) 

1  -  IPFo  JtPFo 

which  is  the  ratio  of  the  partial  area  under  the  actual  ROC  curve  to  the  partial  area  of 
the  perfect  ROC  curve.  The  maximum  value  for  Atpf,,  is  thus  1.  The  Atpf„  value  for  a 
classifier  that  operates  purely  on  random  guessing  is  (1  -TPFo)/2,  which  is  the  area  under 
the  chance  diagonal  normalized  to  1  -  TPFq. 

When  the  conventional  binormal  model  is  employed  for  the  computation  of  the  ROC 
curve,  the  curve  is  completely  defined  by  two  parameters,  a  and  b,  which  are  determined 
from  the  rating  data  using  maximum  likelihood  estimation.  The  constant  b  represents  the 
estimated  standard  deviation  of  the  actually  negative  cases,  normalized  by  the  estimated 
standard  deviation  of  the  actually  positive  cases,  and  the  constant  a  represents  the  estimated 
difference  between  the  means  of  actually  positive  and  negative  cases,  normalized  again 
by  the  estimated  standard  deviation  of  the  actually  positive  cases.  Using  the  binormality 
assumption,  the  partial  area  index  Atpfo  can  be  expressed  as  (McClish  1989,  Jiang  et  al 
1996) 


where 


and 


1 
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1  -TPF, 
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Figure  6.  The  partial  area  index  Atpfo  is  defined  as  the  ratio  of  the  partial  area  under  the  ROC 
curve  above  a  given  sensitivity  (grey  area)  to  the  partial  area  of  the  perfect  ROC  curve  (hatched 
region)  above  the  same  sensitivity. 


Our  goal  in  this  study  was  to  train  a  GA  to  select  features  which  would  yield  high 
specificity  in  the  high-sensitivity  region  of  the  ROC  curve.  Therefore,  the  fitness  of  a 
chromosome  was  defined  as  a  monotonic  function  of  Atpfo»  such  that  the  maximization  of 
Atpfo  would  maximize  the  fitness  function 

fitness  =  (  ~  Y  (3) 

\  Amax  -^min  / 

where  Amax  and  Amin  were  the  maximum  and  minimum  values  of  Atpfq  among  all 
chromosomes  in  a  generation,  and  n  was  a  power  parameter  whose  effect  on  GA  feature 
selection  was  investigated,  as  discussed  in  section  3.  From  equation  (3),  it  is  seen  that  as 
the  power  parameter  becomes  larger  the  difference  in  the  fitness,  and  thus  the  probability 
of  being  chosen  as  parents,  between  the  chromosomes  are  more  amplified.  The  choice  of  n 
is  a  tradeoff  between  the  goal  of  promoting  chromosomes  with  high  fitness  values  and  the 
need  to  retain  segments  of  good  genes  in  other  chromosomes. 

For  a  given  chromosome,  the  parameters  a  and  b  that  are  required  for  the  computation 
of  Atpfo  were  determined  from  the  distribution  of  test  scores  using  the  LABROC  program 
of  Metz  et  al  (1998).  The  partial  area  index  Atpfq  was  then  computed  by  numerically 
integrating  equation  (2).  The  classifiers  thus  designed  will  be  referred  to  as  GA-based 
high-sensitivity  classifiers  in  the  following  discussions. 

In  this  study,  the  significance  of  the  difference  in  Atpfo  of  different  classifiers  was 
determined  using  a  recently  developed  statistical  test  (Jiang  et  al  1996).  The  test  is  analogous 
to  statistical  tests  involving  the  area  A^  under  the  entire  ROC  curve,  and  is  implemented 
using  the  covariance  estimates  of  a  and  b  values  for  the  two  curves. 


3.  Results 

To  demonstrate  the  training  of  high-sensitivity  classifiers  using  GA,  we  chose  two  levels 
of  sensitivity  thresholds,  TPFq  =  0.50  and  TPFq  =  0.95  in  equation  (1).  The  classification 
results  of  these  classifiers  were  compared  with  those  of  LDAsfs.  GA-based  feature  selection 
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Table  2.  The  number  of  features,  the  area  under  the  ROC  curve,  the  partial  area  above  the 
true  positive  fraction  of  0.5  (/lo.so),  and  that  above  0.95  (A0.95)  for  various  values  of  Fin  and 
Fout  in  the  stepwise  feature  selection  method. 


Fin 

F iiut 

Number  of  selected  features 

>4  0.50 

^0.95 

3.8 

2.7 

9 

0.84 

0.71 

0.22 

2.6 

2.4 

13 

0.85 

0.72 

0.27 

2.2 

2.0 

14 

0.86 

0.73 

0.25 

1.8 

1.6 

26 

0.89 

0.80 

0.38 

1.4 

1.2 

41 

0.92 

0.83 

0.47 

1.0 

1.0 

49 

0.92 

0.83 

0.46 

Figure  7.  The  evolution  of  the  number  of  selected  features  for  a  GA  training  session  (n  =  4 
TPFo  =  0.95). 


was  also  performed  with  no  emphasis  on  high  sensitivity  (TPFq  =  0).  The  classifier 
designed  with  the  features  thus  selected  will  be  referred  to  as  an  ordinary  GA-based  classifier. 
Its  performance  was  compared  with  those  of  the  GA-based  high-sensitivity  classifiers  and 
LDAsfs- 

In  LDAsfs,  the  optimal  values  of  the  Fjn  and  Fou,  thresholds  are  not  known  a  priori.  We 
therefore  varied  these  thresholds  to  obtain  the  feature  subset  with  the  best  test  performance. 
Table  2  shows  the  number  of  selected  features,  the  area  under  the  ROC  curve,  the 
partial  area  above  the  true  positive  fraction  of  0.5  (A0.50),  and  that  above  0.95  (A0.95)  as 
these  F  thresholds  are  varied.  By  comparing  the  values  and  the  performance  at  the 
high-sensitivity  portion  of  the  ROC  curve,  the  combination  Fjn  =  1.4,  Fom  =  1.2  was  found 
to  provide  the  best  feature  subset. 

High-sensitivity  classifiers  with  TPFq  =  0.50  and  TPFq  =  0.95  were  trained  with  three 
different  values  of  the  power  parameter,  «  (n  =  1,  2  and  4).  Figure  7  shows  the  evolution 
of  the  number  of  selected  features,  and  figure  8  shows  the  total  area  under  the  ROC  curve 
(Aj)  and  the  partial  area  above  the  true  positive  fraction  of  0.95  (A0.95)  for  a  typical  GA 
training  (n  =  4,  TPFq  =  0.95). 

The  ROC  curve  of  the  best  LDAsfs  classifier  and  those  of  GA-based  classifiers 
(TPFq  =  0.50  and  TPFq  =  0.95)  with  n  =  1,  2  and  4  are  compared  in  figures  9-11 
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Figure  8.  The  evolution  of  the  area  and  the  partial  area  Aqs5  under  the  ROC  curve  for  the 
GA  training  session  of  figure  7  («  =  4,  TPFo  =  0.95). 


respectively.  It  is  observed  from  figures  10  and  11  that  for  n  =  2  or  4,  the  designed  high- 
sensitivity  classifiers  seem  to  be  superior  to  the  best  LDAsfs  classifier  for  large  values  of 
true  positives.  When  n  =  1,  the  ROC  curves  of  the  GA-based  high-sensitivity  classifiers 
are  still  higher  than  that  of  the  LDAsfs  classifier  when  TPF  is  very  close  to  1;  however, 
the  difference  between  the  curves  is  small.  To  quantify  the  improvement  obtained  by 
the  GA-based  high-sensitivity  classifier,  we  performed  statistical  significance  tests  (Jiang 
et  al  1996)  on  the  partial  area  above  a  true-positive  threshold  of  0.95  (A0.95)  as  described 
in  the  previous  section.  With  n  =  4,  the  difference  between  the  partial  areas  of  the 
GA-based  high- sensitivity  classifiers  and  LDAsfs  above  a  true-positive  threshold  of  0.95  was 
statistically  significant  with  two-tailed  p-levels  of  0.006  and  0.02  for  the  classifiers  trained 
with  TPFo  =  0.95  and  TPFo  =  0.5  respectively.  For  n  =  2,  the  corresponding  p-levels  were 
0.01  and  0.07  respectively.  For  n  =  1,  the  difference  did  not  achieve  statistical  significance 
{p  =  0.14  for  TPFo  =  0.95  and  p  —  0.49  for  TPFo  =  0.5).  The  difference  of  the  partial 
area  index  over  a  true-positive  threshold  of  0.5  (A0.50)  did  not  achieve  statistical  significance 
when  the  high-sensitivity  classifiers  trained  with  TPFo  =  0.5  were  compared  with  LDAsfs 
for  any  of  the  power  parameters  studied  (n  =  1,  2  and  4). 

The  performance  of  the  high-sensitivity  classifiers  and  the  ordinary  GA-based  classifiers 
(TPFo  =  0)  are  also  compared  in  figures  9-11.  It  is  observed  that  the  difference  between 
the  high-sensitivity  and  the  ordinary  GA-based  classifiers  is  less  than  the  difference  between 
the  high-sensitivity  classifiers  and  the  LDAsfs.  With  a  two-tailed  significance  test,  it  was 
found  that  the  difference  between  the  partial  areas  of  the  high- sensitivity  and  the  ordinary 
GA-based  classifiers  above  a  true-positive  threshold  of  0.95  (A0.95)  did  not  achieve  statistical 
significance  for  any  of  the  power  parameter  values  studied  (n  =  1,  2  and  4)  with  p-levels 
ranging  between  0.06  and  0.5.  Similarly,  the  difference  between  the  ordinary  GA-based 
classifiers  and  LDAsfs  did  not  achieve  statistical  significance  for  any  of  the  power  parameter 
values  studied.  Table  3  summarizes  the  A^,  A0.50  and  A0.95  values,  as  well  as  the  number 
of  features  selected  by  each  classifier. 

Figures  12  and  13  show  the  distributions  of  the  classifier  outputs  for  the  high- sensitivity 
classifier  (n  =  4,  TPFq  =  0.95)  and  the  LDAsfs  respectively.  Using  the  LDAsfs,  the 
distribution  of  the  malignant  masses  has  a  relatively  long  tail  that  overlaps  with  the 
distribution  of  the  benign  masses.  With  the  high-sensitivity  classifier,  this  tail  seems  to 


2866 


B  Sahiner  et  al 


0.0  0.2  0.4  0.6  0.8  1.0  0.0  0.2  0.4  0.6  0.8  1.0 


FALSE-POSITIVE  FRACTION  FALSE-POSITIVE  FRACTION 

(a)  (b) 


Figure  9.  The  ROC  curves  of  the  LDAsfs,  the  ordinary  GA-based  classifier  (TPFo  =  0),  and  the 
GA-based  high-sensitivity  classifiers  trained  with  TPFo  =  0.50  and  TPFo  =  0.95  using  power 
parameter  =  1:  (a)  the  entire  ROC  curves,  (b)  enlargement  of  the  curves  for  TPF  >  0.8. 
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(a)  (b) 

Figure  10.  The  ROC  curves  of  the  LDAsfs,  the  ordinary  GA-based  classifier  (TPFo  =  0),  and  the 
GA-based  high-sensitivity  classifiers  trained  with  TPFo  =  0.50  and  TPFo  =  0.95  using  power 
parameter  n  =  2:  (a)  the  entire  ROC  curves,  (b)  enlargement  of  the  curves  for  TPF  >  0.8. 

be  shortened,  so  that  more  benign  masses  may  be  correctly  diagnosed  without  missing 
malignancies.  At  100%  sensitivity,  the  specificity  with  the  appropriate  choice  of  the  decision 
threshold  was  61%  and  34%  for  the  high-sensitivity  classifier  and  the  LDAsfs  respectively. 

4.  Discussion 

Figures  10  and  11  demonstrate  that  when  the  feature  selection  is  performed  with  a  properly 
designed  fitness  function  in  the  GA,  the  designed  classifier  can  be  more  effective  than 


Genetic  algorithm  based  high-sensitivity  classifier 


2867 


Figure  11.  The  ROC  curves  of  the  LDAsfs,  the  ordinary  GA-based  classifier  (TPFq  =  0),  and  the 
GA-based  high-sensitivity  classifiers  trained  with  TPFq  =  0.50  and  TPFq  =  0.95  using  power 
parameter  n  —  4:  (a)  the  entire  ROC  curves,  (b)  enlargement  of  the  curves  for  TPF  >0.8. 


Table  3.  The  number  of  features,  the  area  under  the  ROC  curve,  the  partial  area  above  the 
true  positive  fraction  of  0.5  (Aq.so),  and  that  above  0.95  (A0.95)  for  the  GA  parameters  studied. 
For  comparison  purposes,  the  results  with  linear  discriminant  analysis  are  also  included  as  the 
last  row. 


Power 
Parameter,  n 

TPFq  value  for 
GA  training 

Number  of 
selected  features 

Ao.50 

Ao.95 

1 

0 

62 

0.90  ±  0.02 

0.81  ±0.03 

0.47  ±  0.07 

1 

0.5 

61 

0.89  ±  0.02 

0.81  ±0.03 

0.51  ±  0.07 

1 

0.95 

58 

0.84  ±  0.02 

0.76  ±  0.03 

0.55  ±  0.05 

2 

0 

60 

0.93  ±  0.02 

0.86  ±  0.03 

0.51  ±0.08 

2 

0.5 

48 

0.91  ±0.02 

0.85  ±  0.03 

0.58  ±  0.07 

2 

0.95 

50 

0.88  ±  0.02 

0.82  ±  0.03 

0.63  ±  0.05 

4 

0 

40 

0.92  ±  0.02 

0.85  ±  0.03 

0.56  ±  0.07 

4 

0.5 

39 

0.91  ±0.02 

0.85  ±  0.03 

0.62  ±  0.06 

4 

0.95 

40 

0.87  ±  0.02 

0.81  ±0.03 

0.64  ±  0.05 

Linear  discriminant  analysis 

41 

0.92  ±  0.02 

0.83  ±  0.03 

0.47  ±  0.07 

LDAsfs  in  high-sensitivity  region  of  the  ROC  curve.  From  table  3  it  is  observed  that 
although  the  value  for  the  properly  trained  high-sensitivity  classifier  (e.g.  TPFq  =  0.5 
or  0.95  and  n  =  2  or  4)  may  be  less  than  that  of  the  LDAsfs,  the  partial  area  index  A0.95  is 
larger.  The  statistical  analysis  in  this  study  showed  that  the  difference  between  the  properly 
designed  high- sensitivity  classifiers  and  the  LDAsfs  at  the  high-sensitivity  region  of  the  ROC 
curve  can  be  significant. 

Comparing  figure  9  with  figures  10  and  11,  it  is  observed  that  the  selection  of  the 
power  parameter  n  in  GA  training  may  be  important.  The  classifiers  designed  with  n  =  I 
did  not  exhibit  a  major  advantage  over  the  LDAsfs,  as  also  seen  from  table  3  and  the 
statistical  significance  tests.  From  equation  (3),  it  is  seen  that  as  the  power  parameter 
becomes  larger,  the  difference  in  the  fitness,  and  thus  the  probability  of  being  chosen  as 
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DISCRIMINANT  SCORE 

Figure  12.  The  distribution  of  the  classifier  output  for  the  high-sensitivity  classifier  with  «  =  4, 
TPFo  =  0.95.  By  setting  an  appropriate  threshold  on  these  classifier  scores,  61%  of  masses 
could  correctly  be  classified  as  benign  without  missing  any  malignancies  in  this  study. 


DISCRIMINANT  SCORE 

Figure  13,  The  distribution  of  the  classifier  output  for  LDAsfs-  By  setting  an  appropriate 
threshold  on  these  classifier  scores,  34%  of  masses  could  be  correctly  classified  as  benign 
without  missing  any  malignancies  in  this  study. 

parents,  between  the  chromosomes  are  more  amplified.  Therefore,  a  larger  value  of  n 
favours  the  reproduction  of  better  chromosomes  in  a  generation.  Although  it  is  desirable 
to  favour  the  better  chromosomes  in  any  GA  algorithm,  too  much  emphasis  on  better 
chromosomes  might  suppress  the  chance  of  retaining  segments  of  good  genes  in  other 
chromosomes  in  the  gene  pool.  This  is  best  seen  by  letting  n  tend  to  infinity,  and  observing 
that  only  the  best  single  chromosome  will  reproduce  in  this  case,  which  reduces  the  GA 
to  a  random  search  algorithm.  In  our  application,  from  table  3,  it  is  observed  that,  for  all 
three  sensitivity  thresholds  (TPFq  =  0.95,  0.50  and  0),  the  classifier  trained  with  n  =  ]  has 
lower  performance  indices  (A0.95,  Aq.so  and  A^)  than  its  counterpart  trained  with  n  =  2  or 
n  =4.  Although  none  of  these  differences  reached  statistical  significance,  the  consistently 
poorer  performance  of  the  classifiers  trained  with  n  =  1  indicates  that  n  =  1  may  not  be  a 
good  choice  for  GA  training. 
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From  figures  7  and  8  it  is  observed  that  the  best  fitness  and  the  number  of  chromosomes 
did  not  change  between  iterations  140  and  200  for  the  high-sensitivity  classifier  with  n  =  4 
and  TPFo  =  0.95.  A  similar  trend  was  observed  with  the  other  values  of  n  and  TPFq 
investigated  in  this  study.  Therefore,  200  generations  seems  to  be  sufficient  for  the  GA  to 
complete  its  evolution  in  this  application.  In  figure  8,  the  best  value  was  attained  around 
the  fiftieth  generation,  and  the  value  did  not  change  considerably  afterwards.  However, 
the  Ao.95  value  increased  until  around  140  generations.  This  meant  that  the  classification 
accuracy  at  high  sensitivity  continued  to  increase  although  the  value  did  not  change,  i.e. 
the  shape  of  the  ROC  curve  changed  so  that  the  specificity  at  the  high-sensitivity  region 
of  the  ROC  curve  increased,  while  the  specificity  at  the  low-sensitivity  region  of  the  ROC 
curve  decreased. 

Figures  9-11  and  the  statistical  significance  tests  in  section  3  show  that  although  the 
GA-based  high-sensitivity  classifiers  perform  better  than  the  ordinary  GA-based  classifiers 
at  high  sensitivity,  the  difference  between  the  two  classifiers  is  not  statistically  significant. 
Comparison  of  the  LDAsfs  and  the  ordinary  GA-based  classifiers  revealed  that  neither 
the  difference  between  the  A^  values,  nor  the  difference  between  the  Ao.95  values  were 
statistically  significant  (p  >  0.3).  However,  the  difference  between  the  Ao.95  values  of  the 
LDAsfs  and  the  GA-based  high-sensitivity  classifiers  trained  with  power  parameter  n  =  2 
and  n  =  4  was  statistically  significant  (two-tailed  p-level  <0.05),  as  described  in  section  3. 
Thus,  it  was  necessary  to  use  a  high-sensitivity  classifier  in  order  to  obtain  statistically 
significant  improvement  over  the  LDAsfs- 

The  GA-based  high-sensitivity  classifiers  (TPFo  =  0.95  and  TPFo  =  0.5)  and  the 
ordinary  GA-based  classifier  (TPFo  =  0)  were  designed  to  maximize  the  partial  ROC  areas 
above  the  chosen  true-positive  fraction  thresholds.  From  table  3,  it  is  observed  that  this 
goal  is  achieved  for  the  GA-based  classifiers  with  TPFo  values  of  0  and  0.95.  For  each  n, 
the  GA-based  classifier  with  TPFo  =  0  (ordinary  GA-based  classifier)  yielded  the  highest 
Az  value,  and  the  GA-based  classifier  with  TPFo  =  0.95  yielded  the  highest  Ao.95  value 
among  the  classifiers.  For  the  classifier  with  TPFo  =  0.5,  the  A0.50  value  was  larger  than 
or  equal  to  that  of  the  other  GA-based  classifiers  for  n  =  1  and  n  =  4.  However,  for 
n  =  2,  the  ordinary  GA-based  classifier  (TPFo  =  0)  had  the  highest  A0.50  value,  although 
the  difference  was  not  statistically  significant  (p  >  0.3).  This  result  is  not  inconsistent  with 
the  GA  principles  or  operation.  Since  the  GA  training  is  based  on  stochastic  search,  the 
GA  tends  to  evolve  towards  the  optimal  solution,  as  evidenced  by  the  comparison  of  the 
GA-based  classifiers  in  table  3.  However,  the  optimality  of  the  solution  is  not  guaranteed, 
and  one  may  encounter  situations  that  the  design  goal  was  not  totally  achieved,  as  evidenced 
by  the  fact  that  the  ordinary  GA-based  classifier  had  the  highest  A0.50  value  for  n  =  2. 

Given  the  probabilistic  nature  of  GA-based  feature  selection,  it  is  difficult  to  predict  the 
conditions  under  which  the  GA  may  select  a  feature  set  that  provides  a  better  high-sensitivity 
classifier  than  LDAsfs-  Both  our  GA-based  method  and  the  stepwise  feature  selection 
algorithm  were  designed  primarily  to  select  features  for  classifying  classes  that  have 
multivariate  Gaussian  distributions  and  equal  covariance  matrices.  When  these  assumptions 
are  not  satisfied,  the  accuracy  of  feature  selection  will  deteriorate  to  a  different  degree  for 
both  methods.  One  possible  explanation  for  the  relative  success  of  the  GA-based  feature 
selection  might  be  that  our  data  violate  the  assumptions  of  multivariate  normality  and  the 
equality  of  covariance  matrices,  and  that  the  GA-based  method  is  less  sensitive  to  these 
violations. 

In  this  study,  our  focus  was  to  develop  a  methodology  for  the  design  of  high-sensitivity 
classifiers  for  applications  in  CAD.  For  the  specific  application  of  discriminating  malignant 
and  benign  breast  lesions,  our  data  set  was  limited  and  the  features  selected  by  the  GA 
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may  not  be  the  optimal  set  of  features  for  the  general  population.  The  same  is  true  for  the 
LDAsfs.  Considering  that  the  data  set  contained  only  255  masses,  the  number  of  features 
selected  both  by  the  GA  and  the  LDAsfs  was  large.  As  a  result,  if  a  classifier  trained  in  this 
study  is  applied  without  modification  to  the  population  at  large,  the  classification  accuracy 
is  likely  to  be  poorer  than  that  obtained  in  this  paper.  However,  the  methodology  developed 
in  this  study  is  general.  When  a  sufficiently  large  data  set  becomes  available,  the  GA-based 
high-sensitivity  feature  selection  algorithm  can  be  reapplied,  and  a  more  robust  feature  set 
can  be  determined.  The  number  of  training  cases  required  for  generalizable  classifier  design 
and  feature  selection  has  been  the  subject  of  recent  studies  (Raudys  and  Jain  1991,  Wagner 
et  al  1997,  Chan  et  al  1997b),  and  is  currently  under  investigation. 

An  important  consideration  concerning  the  use  of  GAs  for  optimization  is  the  speed 
of  computation.  Depending  on  the  number  of  final  features  selected,  the  GA-based  feature 
selection  implemented  in  this  study  (340  features,  200  chromosomes,  200  generations  and 
leave-one-case-out  GA  training)  took  between  24  and  60  h  on  an  AlphaStation  500  (400  Mhz 
Alpha  chip),  whereas  the  stepwise  feature  selection  performed  on  a  PC  compatible  computer 
with  a  90  MHz  Pentium  processor  took  less  than  10  min.  Therefore,  GA-based  feature 
selection  implemented  in  this  study  may  not  be  practical  for  studies  where  the  feature 
selection  has  to  be  performed  many  times.  The  high-sensitivity  classifier  design  method 
developed  in  this  study  may  be  more  appropriate  if  the  speed  of  computation  is  of  secondary 
importance  to  the  classification  accuracy  of  the  designed  classifier.  For  example,  the  GA- 
based  high-sensitivity  classifier  can  be  trained  only  once  when  a  final  set  of  features  is 
desired  for  a  large  data  set  as  discussed  above. 

5.  Conclusion 

We  have  developed  a  GA-based  method  to  design  a  high-sensitivity  classifier  for  CAD 
applications.  The  usefulness  of  the  method  was  demonstrated  by  the  problem  of  classifying 
masses  on  digitized  mammograms.  Texture  features  extracted  from  REST  images  were  used 
to  distinguish  malignant  and  benign  masses.  The  accuracy  of  the  high-sensitivity  classifier 
was  shown  to  be  significantly  higher  than  that  of  LDAsfs  above  a  true-positive  fraction  of 
0.95.  By  using  an  appropriate  decision  threshold  on  the  high-sensitivity  classifier  scores. 
61%  of  the  benign  masses  could  correctly  be  identified  without  missing  any  malignant 
masses.  The  GA  may  therefore  be  a  useful  tool  in  the  design  of  high-sensitivity  classifiers 
for  different  classification  problems  in  CAD  or  other  applications. 
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We  are  developing  computerized  feature  extraction  and  classification  methods  to  analyze  malignant 
and  benign  microcalcifications  on  digitized  mammograms.  Morphological  features  that  described 
the  size,  contrast,  and  shape  of  microcalcifications  and  their  variations  within  a  cluster  were  de¬ 
signed  to  characterize  microcalcifications  segmented  from  the  mammographic  background.  Texture 
features  were  derived  from  the  spatial  gray-level  dependence  (SGLD)  matrices  constructed  at 
multiple  distances  and  directions  from  tissue  regions  containing  microcalcifications.  A  genetic 
algorithm  (GA)  based  feature  selection  technique  was  used  to  select  the  best  feature  subset  from  the 
multi-dimensional  feature  spaces.  The  GA-based  method  was  compared  to  the  commonly  used 
feature  selection  method  based  on  the  stepwise  linear  discriminant  analysis  (LDA)  procedure. 
Linear  discriminant  classifiers  using  the  selected  features  as  input  predictor  variables  were  formu¬ 
lated  for  the  classification  task.  The  discriminant  scores  output  from  the  classifiers  were  analyzed 
by  receiver  operating  characteristic  (ROC)  methodology  and  the  classification  accuracy  was  quan¬ 
tified  by  the  area,  under  the  ROC  curve.  We  analyzed  a  data  set  of  145  mammographic 
microcalcification  clusters  in  this  study.  It  was  found  that  the  feature  subsets  selected  by  the 
GA-based  method  are  comparable  to  or  slightly  better  than  those  selected  by  the  stepwise  LDA 
method.  The  texture  features  (A 2  =  0.84)  were  more  effective  than  morphological  features  (A^ 
=  0.79)  in  distinguishing  malignant  and  benign  microcalcifications.  The  highest  classification  ac¬ 
curacy  (A  2 =0.89)  was  obtained  in  the  combined  texture  and  morphological  feature  space.  The 
improvement  was  statistically  significant  in  comparison  to  classification  in  either  the  morphological 
ip  =  0.002)  or  the  texture  (p  =  0.04)  feature  space  alone.  The  classifier  using  the  best  feature  subset 
from  the  combined  feature  space  and  an  appropriate  decision  threshold  could  correctly  identify  35% 
of  the  benign  clusters  without  missing  a  malignant  cluster.  When  the  average  discriminant  score 
from  all  views  of  the  same  cluster  was  used  for  classification,  the  A^  value  increased  to  0.93  and  the 
classifier  could  identify  50%  of  the  benign  clusters  at  100%  sensitivity  for  malignancy.  Alterna¬ 
tively,  if  the  minimum  discriminant  score  from  all  views  of  the  same  cluster  was  used,  the  A^  value 
would  be  0.90  and  a  specificity  of  32%  would  be  obtained  at  100%  sensitivity.  The  results  of  this 
study  indicate  the  potential  of  using  combined  morphological  and  texture  features  for  computer- 
aided  classification  of  microcalcifications.  ©  1998  American  Association  of  Physicists  in  Medi^ 
cine.  [80094-2405(98)00910-9] 

Key  words:  computer-aided  diagnosis,  mammography,  microcalcifications,  genetic  algorithm, 
linear  discriminant  analysis,  ROC  analysis 


I.  INTRODUCTION 

Mammography  is  the  most  sensitive  method  for  early  detec¬ 
tion  of  breast  cancers.  However,  its  specificity  for  differen¬ 
tiating  malignant  and  benign  lesions  is  relatively  low.  In  the 
United  States,  the  positive  predictive  value  of  mammography 
ranges  from  about  15%  to  30%.^’^  Various  methods  are  being 
developed  to  improve  the  sensitivity  and  specificity  of  breast 
cancer  detection.^  Computer-aided  diagnosis  (CAD)  is  con¬ 
sidered  to  be  one  of  the  promising  approaches  that  may  im¬ 
prove  the  efficacy  of  mammography,^  Properly  designed 
CAD  algorithms  can  automatically  detect  suspicious  lesions 


on  a  mammogram  and  alert  the  radiologist  to  these  regions. 
They  can  also  extract  image  features  from  regions  of  interest 
(ROIs)  and  estimate  the  likelihood  of  malignancy  for  a  given 
lesion,  thereby  providing  the  radiologist  with  additional  in¬ 
formation  for  making  diagnostic  decisions. 

There  are  two  major  approaches  to  the  development  of 
CAD  schemes  for  classification  of  mammographic  abnor¬ 
malities.  One  approach  uses  computer  vision  techniques  to 
extract  image  features  from  the  digitized  mammograms  and 
classify  the  lesions  based  on  the  computer-extracted  features. 
The  computer-extracted  features  can  include  morphological 
features  that  are  commonly  used  by  radiologists  for  diagno- 
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sis,  as  well  as  texture  features  that  may  not  be  readily  per¬ 
ceived  by  human  eyes.  The  computerized  analysis  may 
therefore  increase  the  utilization  of  mammographic  image 
information  and  improve  the  accuracy  of  differentiating  ma¬ 
lignant  and  benign  lesions.  The  other  approach  uses  radiolo¬ 
gists’  ratings  of  mammographic  features  or  encodes  the  ra¬ 
diologists’  readings  with  numerical  values.  The  lesions  are 
then  classified  based  on  these  radiologist-extracted  features. 
This  approach  assists  radiologists  by  systematically  extract¬ 
ing  image  features  and  by  optimally  merging  the  features 
with  a  statistical  classifier  to  reach  a  diagnostic  decision. 
Additional  risk  factors  based  on  patient  demographic  infor¬ 
mation  and  medical  or  family  histories  may  also  be  included 
as  input  in  either  approach. 

A  number  of  investigators  have  developed  feature  extrac¬ 
tion  and  classification  methods  for  characterization  of  mam¬ 
mographic  masses  or  microcalcifications.  Ackerman  et  al^ 
developed  4  measures  of  malignancy  and  classified  lesions 
recorded  on  120  digitized  xeroradiographs  by  3  decision 
methods.  Kilday  et  ai^  used  7  shape  descriptors  and  patient 
age  to  classify  39  masses  and  could  correctly  classify  69%  of 
the  masses.  Huo  et  al?  analyzed  the  spiculation  of  masses 
using  a  radial  edge-gradient  analysis  technique  and  achieved 
an  area,  under  the  receiver  operating  characteristic 
(ROC)  curve  of  0.88  in  a  data  set  of  95  masses.  Sahiner 
et  developed  a  rubber-band  straightening  image  trans¬ 
formation  technique  to  analyze  the  texture  in  the  region  sur¬ 
rounding  a  mass  and  obtained  an  A  ^  of  0.94  in  a  data  set  of 
168  masses.  Pohlman  et  al}^  extracted  6  morphological  de¬ 
scriptors  to  classify  47  masses  and  obtained  A^  values  rang¬ 
ing  from  0.76  to  0.93.  Wee  et  analyzed  51  microcalci¬ 
fication  clusters  on  specimen  radiographs  using  the  average 
gray  level,  contrast,  and  horizontal  length  of  the  microcalci¬ 
fications  and  obtained  84%  correct  classification.  Fox  et  al?^ 
included  cluster  features  in  their  classifier  and  obtained  67% 
correct  classification  in  a  data  set  of  100  clusters  from  speci¬ 
men  radiographs.  Chan  et  developed  morphological 

and  texture  features  and  evaluated  various  feature  classifiers 
for  differentiation  of  malignant  and  benign  microcalcifica¬ 
tions.  Shen  et  al  used  3  shape  features,  compactness,  mo¬ 
ments,  and  Fourier  descriptors  to  classify  143  individual  mi¬ 
crocalcifications  with  a  nearest  neighbor  classifier  and 
obtained  100%  classification  accuracy.  Wu  et  al?^  classified 
80  pathologic  specimens  radiographs  with  a  convolution 
neural  network  and  obtained  an  A^  of  0.90.  Jiang  et  al?^ 
trained  a  neural  network  classifier  to  analyze  8  features  ex¬ 
tracted  from  microcalcification  clusters  and  obtained  an  A^ 
of  0.92  in  a  data  set  of  53  patients.  Thiele  et  al?^  extracted 
texture  and  fractal  features  from  the  tissue  region  surround¬ 
ing  a  microcalcification  cluster  for  classification  and 
achieved  a  sensitivity  of  89%  at  a  specificity  of  83%  for  54 
clusters.  Dhawan  et  alP  used  features  derived  from  first- 
order  and  second-order  gray-level  histogram  statistics  and 
obtained  an  A^  of  0.81  with  a  neural  network  classifier  for  a 
data  set  of  191  clusters. 

Computerized  classification  of  mammographic  lesions  us¬ 
ing  radiologist-extracted  features  has  also  been  reported  by  a 
number  of  investigators.  Ackerman  et  alP  estimated  the 
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probability  of  malignancy  of  mammographic  lesions  by  ana¬ 
lyzing  36  radiologist-extracted  characteristics  with  an  auto¬ 
matic  clustering  algorithm  and  obtained  a  specificity  of  45% 
at  a  sensitivity  of  100%  in  a  data  set  of  102  cases.  Gale 
etalP  analyzed  12  radiologist-extracted  features  of  mam¬ 
mographic  lesions  with  a  computer  algorithm  and  obtained  a 
specificity  of  88%  at  a  sensitivity  of  79%  in  a  data  base  of 
500  patients.  Getty  et  alP  developed  a  computer  classifier  to 
enhance  the  differentiation  of  malignant  and  benign  lesions 
by  a  radiologist  during  interpretation  of  xeromammograms. 
Using  a  similar  approach,  D’Orsi  et  alP  evaluated  a  com¬ 
puter  aid  and  obtained  an  improvement  of  about  0.05  in  sen¬ 
sitivity  or  specificity  in  mammographic  reading.  Wu  et  alP 
trained  a  neural  network  to  merge  14  radiologist-extracted 
features  for  classification  of  mammographic  lesions  and  ob¬ 
tained  an  Aj  of  0.89.  Baker  et  alP  trained  a  neural  network 
based  on  the  lexicon  of  the  Breast  Imaging  Recording  and 
Data  System  of  the  American  College  of  Radiology  and 
found  that  the  neural  network  could  improve  the  positive 
predictive  value  from  35%  to  61%  in  206  lesions.  Lo  et  alP 
used  a  similar  approach  to  predict  breast  cancer  invasion  and 
obtained  an  A  ^  of  0.91  for  96  lesions.  Although  the  results  of 
these  studies  varied  over  a  wide  range  and  the  performances 
of  the  computer  algorithms  are  expected  to  depend  strongly 
on  data  set,  they  indicate  the  potential  of  using  CAD  tech¬ 
niques  to  improve  the  diagnostic  accuracy  of  differentiating 
malignant  and  benign  lesions. 

In  our  early  studies,  we  found  that  texture  features  ex¬ 
tracted  from  spatial  gray-level  dependence  (SGLD)  matrices 
at  multiple  distances  were  useful  for  differentiating  malig¬ 
nant  and  benign  masses  on  mammograms.  This  may  be  at¬ 
tributed  to  the  texture  changes  in  the  breast  tissue  due  to  a 
developing  malignancy.  The  usefulness  of  SGLD  texture 
measures  in  differentiating  malignant  and  benign  breast  tis¬ 
sues  was  further  demonstrated  by  analysis  of  mammographic 
microcalcifications.^"^’^®’^^  In  a  preliminary  study,  we  devel¬ 
oped  morphological  features  to  describe  the  size,  shape,  and 
contrast  of  the  individual  microcalcifications  and  their  varia¬ 
tion  within  a  cluster.  We  used  these  features  to  classify  the 
microcalcifications  and  obtained  moderate  results.  In  the 
present  study,  we  expanded  the  data  set  and  explored  the 
feasibility  of  combining  texture  and  morphological  features 
for  classification  of  microcalcifications.  The  classification  ac¬ 
curacy  in  the  combined  feature  space  was  compared  with 
those  obtained  in  the  texture  feature  space  or  in  the  morpho¬ 
logical  feature  space  alone.  We  also  studied  the  use  of  a 
genetic  algorithm^^"^''  (GA)  to  select  a  feature  subset  from 
the  large-dimension  feature  spaces,  and  compared  the  classi¬ 
fication  results  to  those  obtained  from  features  selected  with 
stepwise  linear  discriminant  analysis  (LDA).^^  Linear  dis¬ 
criminant  classifiers^*  were  designed  for  the  classification 
tasks.  The  performance  of  the  classifiers  was  analyzed  with 
ROC  methodology^’  and  the  classification  accuracy  was 
quantified  with  the  area,  ,  under  the  ROC  curve. 
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Fig.  1.  Distribution  of  the  visibility  rankings  of  the  145  clusters  of  micro- 
calcifications.  Higher  ranking  corresponds  to  more  subtle  clusters. 

II.  MATERIALS  AND  METHODS 

A.  Data  set 

The  data  set  for  this  study  consisted  of  145  clusters  of 
microcalcifications  from  mammograms  of  78  patients.  The 
cases  were  selected  from  the  patient  files  in  the  Department 
of  Radiology  at  the  University  of  Michigan.  The  only  selec¬ 
tion  criterion  was  that  it  included  a  biopsy-proven  microcal¬ 
cification  cluster.  We  kept  the  number  of  malignant  and  be¬ 
nign  cases  reasonably  balanced  so  that  82  benign  and  63 
malignant  clusters  were  included.  All  mammograms  were 
acquired  with  a  contact  technique  using  mammography  sys¬ 
tems  accredited  by  the  American  College  of  Radiology 
(ACR).  The  dedicated  mammographic  systems  had  molyb¬ 
denum  anode  and  molybdenum  filter,  0.3  mm  nominal  focal 
spot,  reciprocating  grid,  and  Kodak  MinR/MinR  E  screen- 
film  systems  with  extended  processing.  A  radiologist  experi¬ 
enced  in  mammography  ranked  the  visibility  of  each  micro¬ 
calcification  cluster  on  a  scale  of  1  (obvious)  to  5  (subtle), 
relative  to  the  visibility  range  of  microcalcification  clusters 
encountered  in  clinical  practice.  The  histogram  of  the  visibil¬ 
ity  ranking  of  the  145  clusters  is  shown  in  Fig.  1.  The  histo¬ 
gram  indicated  the  mix  of  subtle  and  obvious  clusters  in¬ 
cluded  in  the  data  set. 

The  selected  mammograms  were  digitized  with  a  laser 
scanner  (Lumisys  DIS-1000)  at  a  pixel  size  of  0.035  mm 
X 0.035  mm  and  12-bit  gray  levels.  The  digitizer  has  an  op¬ 
tical  density  (O.D.)  range  of  about  0  to  3.5.  The  O.D.  on  the 
film  was  digitized  linearly  to  pixel  value  at  a  calibration  of 
0.001  O.D.  unit/pixel  value  in  the  O.D.  range  of  about  0  to 
2.8.  The  digitizer  deviated  from  a  linear  response  at  O.D. 
higher  than  2.8. 

B.  Morphological  feature  space 

For  the  extraction  of  morphological  features,  the  locations 
of  the  individual  microcalcifications  have  to  be  known.  We 
have  developed  an  automated  program  for  detection  of  indi¬ 
vidual  microcalcifications.^^  However,  the  detection  sensitiv¬ 
ity  is  not  100%  and  the  detected  signals  include  false- 
positives.  Furthermore,  automated  detection  tends  to  have  a 
higher  likelihood  of  detecting  obvious  microcalcifications 
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than  subtle  ones,  which  may  bias  the  evaluation  of  the  clas¬ 
sification  capability  of  the  extracted  features  and  the  trained 
classifiers  if  microcalcifications  detected  by  the  automated 
program  are  used  for  classifier  development.  Since  these 
variables  are  program  dependent,  we  isolated  the  detection 
problem  from  the  classification  problem  in  this  study  by  us¬ 
ing  manually  identified  true  microcalcifications  for  the  mor¬ 
phological  feature  analysis.  The  true  microcalcifications 
were  defined  as  those  visible  on  the  film  mammograms  with 
a  magnifier.  Magnification  mammograms  were  used  occa¬ 
sionally  for  verification  when  they  were  available,  but  in 
most  cases  only  contact  mammograms  were  used.  At 
present,  there  is  no  other  method  that  can  more  reliably  iden¬ 
tify  individual  microcalcifications  on  mammograms.  Speci¬ 
men  radiographs  can  confirm  the  presence  of  the  microcalci¬ 
fications  but  the  locations  of  the  individual  micro¬ 
calcifications  cannot  be  correlated  with  those  on  the  mam¬ 
mograms  because  of  the  very  different  imaging  geometry 
and  techniques. 

We  have  developed  an  automated  signal  extraction  pro¬ 
gram  to  determine  the  size,  contrast,  signal-to-noise  ratio 
(SNR),  and  shape  of  the  microcalcifications  from  a  mammo¬ 
gram  based  on  the  coordinate  of  each  individual  microcalci¬ 
fication.  In  a  local  region  of  101 X  101  pixels  centered  at  each 
signal  site,  the  low  frequency  structured  background  is  esti¬ 
mated  by  polynomial  curve  fitting  in  the  horizontal  and  ver¬ 
tical  directions  and  then  averaging  the  fitted  values  obtained 
in  the  two  directions  at  each  pixel.  This  background  estima¬ 
tion  method  is  used  because  it  can  approximate  the  back¬ 
ground  more  closely  than  two-dimensional  surface  fitting  or 
the  distance-weighted  interpolation  method  (described  be¬ 
low)  used  for  texture  feature  extraction.  The  central  I X  /  pix¬ 
els  that  contain  the  signal  are  excluded  from  the  curve  fitting 
and  noise  estimation.  The  size  I  is  chosen  to  be  a  constant  of 
15  pixels  which  is  larger  than  the  diameters  of  the  microcal¬ 
cifications  of  interest  yet  much  smaller  than  the  local  region. 
The  background  pixel  values  in  this  IX I  region  are  estimated 
from  the  fitted  and  smoothed  background  surface.  The  exclu¬ 
sion  of  the  signal  region  is  necessary  so  that  the  high  contrast 
pixel  values  of  the  microcalcification  will  not  affect  the 
background  estimation  at  the  signal  site.  Other  microcalcifi¬ 
cations  that  may  locate  within  the  101 X  101  pixel  region  are 
treated  as  background  pixels  because  their  effect  on  the  es¬ 
timated  background  levels  at  the  signal  site  will  be  relatively 
small. 

After  subtraction  of  the  structured  background,  the  local 
root-mean-square  (rms)  noise  is  calculated.  A  gray-level 
threshold  is  determined  as  the  product  of  the  rms  noise  and 
an  input  SNR  threshold.  With  a  region  growing  technique, 
the  signal  region  is  then  extracted  as  the  connected  pixels 
above  the  threshold  around  the  manually  identified  signal 
location.  A  high  threshold  will  result  in  extracting  only  the 
peak  pixels  of  the  microcalcification  which  may  not  repre¬ 
sent  its  shape  perceived  on  the  mammogram.  A  low  thresh¬ 
old  will  cause  the  microcalcification  region  to  grow  into  the 
surrounding  background  pixels.  Since  there  is  no  objective 
standard  what  the  actual  shape  of  a  microcalcification  is  on  a 
mammogram,  the  proper  threshold  to  extract  the  signals  was 
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Fig.  2.  An  example  of  a  cluster  of  malignant  microcalcifications  in  the  data 
set:  (a)  the  cluster  with  mammographic  background,  (b)  the  cluster  after 
segmentation.  Morphological  features  are  extracted  from  the  segmented  mi¬ 
crocalcifications. 

determined  by  visually  comparing  the  microcalcifications  in 
the  original  image  and  the  thresholded  image  of  the  micro¬ 
calcifications  superimposed  on  a  background  of  constant 
pixel  values.  After  an  experienced  radiologist  compared  a 
subset  of  randomly  selected  microcalcification  clusters  ex¬ 
tracted  at  different  thresholds,  an  SNR  threshold  of  2.0  was 
chosen  for  all  cases.  An  example  of  a  malignant  cluster  and 
the  microcalcifications  extracted  at  an  SNR  threshold  of  2.0 
is  shown  in  Fig.  2. 

The  feature  descriptors  determined  from  the  extracted  mi¬ 
crocalcifications  are  listed  in  Table  I.  The  size  of  a  microcal¬ 
cification  (SA)  is  estimated  as  the  number  of  pixels  in  the 
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Table  I.  The  21  morphological  features  extracted  from  a  microcalcification 
cluster. 


Average 

Standard 

deviation 

Coefficient 
of  variation 

Maximum 

Area 

AVSA 

SDSA 

eVSA 

MXSA 

Mean  density 

AVMD 

SDMD 

CVMD 

MXMD 

Eccentricity 

AVEC 

SDEC 

CVEC 

MXEC 

Moment  ratio 

AVMR 

SDMR 

CVMR 

MXMR 

Axis  ratio 

No.  of  microcalcifications 
in  cluster 

AVAR 

NUMS 

SOAR 

eVAR 

MXAR 

signal  region.  The  mean  density  (MD)  is  the  average  of  the 
pixel  values  above  the  background  level  within  the  signal 
region.  The  second  moments  are  calculated  as 


i 

(1) 

i 

(2) 

(3) 

where  g,  is  the  pixel  value  above  the  background,  and 
(Xi.yi)  are  the  coordinates  of  the  iih  pixel.  The  moments 
M 0 ,  and  My  are  defined  as  follows: 


3^0=2  gi, 

i 

(4) 

giXilMo, 

i 

(5) 

=  S  giyifMo- 

i 

(6) 

The  summations  are  over  all  pixels  within  the  signal  region. 
The  lengths  of  the  major  axis,  2a,  and  the  minor  axis,  2b,  of 

the  effective  ellipse  that  characterizes  the  second 
are  given  by 

moments 

2a=  yf2[M,,+  Myy+  y/(M,,-Myyf  +  4Ml.], 

(7) 

2b=  V2[A/„+M,,-  'f{M,,-Myyy  +  4Mly]. 

(8) 

The  eccentricity  (EC)  of  the  effective  ellipse  can  be  derived 

from  the  major  and  minor  axes  as 

yja^-b^ 

€= - . 

a 

(9) 

The  moment  ratio  (MR)  is  defined  as  the  ratio  of  to 
Myy,  with  the  larger  second  moment  in  the  denominator. 
The  axis  ratio  (AR)  is  the  ratio  of  the  major  axis  to  the  minor 
axis  of  the  effective  eclipse. 

To  quantify  the  variation  of  the  visibility  and  shape  de¬ 
scriptors  in  a  cluster,  the  maximum  (MX),  the  average  (AV) 
and  the  standard  deviation  (SD)  of  each  feature  for  the  indi¬ 
vidual  microcalcifications  in  the  cluster  are  calculated.  The 
coefficient  of  variation  (CV),  which  is  the  ratio  of  the  SD  to 
AV,  is  used  as  a  descriptor  of  the  variability  of  a  certain 
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feature  within  a  cluster.  Twenty  cluster  features  are  therefore 
derived  from  the  five  features  (size,  mean  density,  moment 
ratio,  axis  ratio,  and  eccentricity)  of  the  individual  microcal¬ 
cifications.  Another  feature  describing  the  number  of  micro¬ 
calcifications  in  a  cluster  (NUMS)  is  also  added,  resulting  in 
a  21 -dimensional  morphological  feature  space. 

C.  Texture  feature  space 

Our  texture  feature  extraction  method  has  been  described 
in  detail  previously Briefly,  texture  features  are  extracted 
from  a  1024X 1024  pixel  region  of  interest  (ROI)  that  con¬ 
tains  the  cluster  of  microcalcifications.  Most  of  the  clusters 
in  this  data  set  can  be  contained  within  the  ROI.  For  the  few 
clusters  that  are  substantially  larger  than  a  single  ROI,  addi¬ 
tional  ROIs  containing  the  remaining  parts  of  the  cluster  are 
extracted  and  processed  in  the  same  way  as  the  other  ROIs. 
The  texture  feature  values  extracted  from  the  different  ROIs 
of  the  same  cluster  are  averaged  and  the  average  values  are 
used  as  the  feature  values  for  that  cluster. 

For  a  given  ROI,  background  correction  is  first  performed 
to  reduce  the  low  frequency  gray-level  variation  due  to  the 
density  of  the  overlapping  breast  tissue  and  the  x-ray  expo¬ 
sure  conditions.  The  gray  level  at  a  given  pixel  of  the  low 
frequency  background  is  estimated  as  the  average  of  the 
distance-weighted  gray  levels  of  four  pixels  at  the  intersec¬ 
tions  of  the  normals  from  the  given  pixel  to  the  four  edges  of 
the  ROI.^^  The  estimated  background  image  was  subtracted 
from  the  original  ROI  to  obtain  a  background-corrected  im¬ 
age.  An  example  of  the  background  correction  procedure  is 
shown  in  Fig.  3. 

As  discussed  in  our  previous  study it  was  found  that  the 
texture  features  derived  from  the  SOLD  matrix  of  the  ROI 
provided  useful  texture  information  for  classification  of  mi¬ 
crocalcification  clusters.  The  SOLD  matrix  element, 
is  the  joint  probability  of  the  occurrence  of  gray 
levels  i  and  j  for  pixel  pairs  which  are  separated  by  a  distance 
d  and  at  a  direction  6."^  The  SOLD  matrices  were  con¬ 
structed  from  the  pixel  pairs  in  a  subregion  of  512X512 
pixels  centered  approximately  at  the  center  of  the  cluster  in 
the  background-corrected  ROI  so  that  any  potential  edge  ef¬ 
fects  caused  by  background  correction  will  not  affect  the 
texture  extraction.  We  analyzed  the  texture  features  in  four 
directions:  0=0°,  45°,  90°,  and  135°  at  each  pixel  pair  dis¬ 
tance  d.  The  pixel  pair  distance  was  varied  from  4  to  40 
pixels  in  increments  of  4  pixels.  Therefore,  a  total  of  40 
SOLD  matrices  were  derived  from  each  ROI.  The  SOLD 
matrix  depends  on  the  bin  width  (or  gray-level  interval)  used 
in  accumulating  the  histogram.  Based  on  our  previous  study, 
a  bin  width  of  four  gray  levels  was  chosen  for  constructing 
the  SOLD  matrices.  This  is  equivalent  to  reducing  the  gray- 
level  resolution  (or  bit  depth)  of  the  12-bit  image  to  10  bits 
by  eliminating  the  2  least  significant  bits. 

From  each  of  the  SOLD  matrices,  we  derived  13  texture 
measures  including  correlation,  entropy,  energy  (angular  sec¬ 
ond  moment),  inertia,  inverse  difference  moment,  sum  aver¬ 
age,  sum  entropy,  sum  variance,  difference  average,  differ¬ 
ence  entropy,  difference  variance,  information  measure  of 


(b) 


Fig.  3.  An  example  of  background  correction  for  the  ROIs  before  texture 
feature  extraction.  The  ROI  from  the  original  image  is  shown  in  Fig.  2(a). 
(a)  The  estimated  low  frequency  background  gray  level,  and  (b)  the  ROI 
after  background  correction.  The  background  gray-level  variation  due  to  the 
varying  x-ray  penetration  in  the  breast  tissue  is  reduced.  The  contouring  in 
the  background  image  is  a  display  artifact  that  does  not  exist  in  the  calcu¬ 
lated  image  file.  For  display  purpose,  the  background-corrected  ROI  is 
contrast-enhanced  to  improve  the  visibility  of  the  microcalcifications  and  the 
detailed  structures. 


correlation  1,  and  information  measure  of  correlation  2.  The 
formulation  of  these  texture  measures  could  be  found  in  the 
literature.^^’"^^  As  found  in  our  previous  study we  did  not 
observe  a  significant  dependence  of  the  discriminatory  power 
of  the  texture  features  on  the  direction  of  the  pixel  pairs  for 
mammographic  textures.  However,  since  the  actual  distance 
between  the  pixel  pairs  in  the  diagonal  direction  was  a  factor 
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Fig.  4,  A  schematic  diagram  of  the  genetic  algorithm  designed  for  feature 
selection  used  in  this  study.  Xj . X„  represents  the  set  of  parent  chromo¬ 
somes  and  Xj . represents  the  set  of  offspring  chromosomes. 


of  greater  than  that  in  the  axial  direction,  we  averaged  the 
feature  values  in  the  axial  directions  (0°  and  90°)  and  in  the 
diagonal  directions  (45°  and  135°)  separately  for  each  tex¬ 
ture  feature  derived  from  the  SOLD  matrix  at  a  given  pixel 
pair  distance.  The  average  texture  features  at  the  ten  pixel 
pair  distances  and  two  directions  formed  a  260-dimensional 
texture  feature  space. 

D.  Feature  selection 

Feature  selection  is  one  of  the  most  important  steps  in 
classifier  design  because  the  presence  of  ineffective  features 
often  degrades  the  performance  of  a  classifier  on  test 
samples.  This  is  partly  caused  by  the  “curse  of  dimension¬ 
ality”  problem  that  the  classifier  is  inadequately  trained  in  a 
large-dimension  feature  space  when  only  a  finite  number  of 
training  samples  is  available."^^""^^  We  compared  two  feature 
selection  methods  to  extract  useful  features  from  the  mor¬ 
phological,  texture,  and  the  combined  feature  spaces.  One  is 
a  genetic  algorithm  approach,  and  the  other  is  the  commonly 
used  stepwise  linear  discriminant  analysis  method. 

1.  Genetic  algorithm  for  feature  selection 

The  genetic  algorithm  (GA)  methodology  was  first  intro¬ 
duced  by  Holland  in  the  early  1970s.^^’^^  A  GA  solves  an 
optimization  problem  based  on  the  principles  of  natural  se¬ 
lection.  In  natural  selection,  a  population  evolves  by  finding 
beneficial  adaptations  to  a  complex  environment.  The  char¬ 
acteristics  of  a  population  are  carried  onto  the  next  genera¬ 
tion  by  its  chromosomes.  New  characteristics  are  introduced 
into  a  chromosome  by  crossover  and  mutation.  The  probabil¬ 
ity  of  survival  or  reproduction  of  an  individual  depends  more 
or  less  on  its  fitness  to  the  environment.  The  population 
therefore  evolves  toward  better-fit  individuals. 

The  application  of  GA  to  feature  selection  has  been  de¬ 
scribed  in  the  literature.'^^'^^  We  have  demonstrated  previ¬ 
ously  that  a  GA  could  select  effective  features  for  classifica¬ 
tion  of  masses  and  normal  breast  tissue  from  a  very  large- 
dimension  feature  space.^"^  The  GA  was  adapted  to  the 
current  problem  for  classification  of  malignant  and  benign 
microcalcifications.  A  brief  outline  is  given  as  follows.  Each 
feature  in  a  given  feature  space  is  treated  as  a  gene  and  is 
encoded  by  a  binary  digit  (bit)  in  a  chromosome.  A  “1” 
represents  the  presence  of  the  feature  and  a  “0”  represents 
the  absence  of  the  feature.  The  number  of  genes  (bits)  on  a 
chromosome  is  equal  to  the  dimensionality  (k)  of  the  feature 
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space,  but  only  the  features  that  are  encoded  as  “1”  are 
actually  present  in  the  subset  of  selected  features.  A  chromo¬ 
some  therefore  represents  a  possible  solution  to  the  feature 
selection  problem. 

The  implementation  of  GA  for  feature  selection  is  illus¬ 
trated  in  the  block  diagram  shown  in  Fig.  4.  To  allow  for 
diversity,  a  large  number,  n,  of  chromosomes,  Xj ,  is 
chosen  as  the  population.  The  number  of  chromosomes  is 
kept  constant  in  each  generation.  At  the  initiation  of  the  GA, 
each  bit  on  a  chromosome  is  initialized  randomly  with  a 
small  but  equal  probability,  to  be  “1.”  The  selected 
feature  subset  on  a  chromosome  is  used  as  the  input  feature 
variables  to  a  classifier,  which  was  chosen  to  be  the  Fischer’s 
linear  discriminant  in  this  study. 

The  available  samples  in  the  dataset  are  randomly  parti¬ 
tioned  into  a  training  set  and  a  test  set.  The  training  set  is 
used  to  formulate  a  linear  discriminant  function  with  each  of 
the  selected  feature  subsets.  The  effectiveness  of  each  of  the 
linear  discriminants  for  classification  is  evaluated  with  the 
test  set.  The  classification  accuracy  is  determined  as  the  area, 

,  under  the  ROC  curve.  To  reduce  biases  in  the  classifiers 
due  to  case  selection,  training  and  testing  are  performed  a 
large  number  of  times,  each  with  a  different  random  parti¬ 
tioning  of  the  data  set.  In  this  study,  we  chose  to  partition  the 
dataset  80  times  and  the  80  test  A^  values  were  averaged  and 
used  for  determination  of  the  fitness  of  the  chromosome. 

The  fitness  function  for  the  ith  chromosome,  F(i),  is  for¬ 
mulated  as 


F{i)  = 


m-fn 


fmax  fn 


,  1= 


where 


(10) 


A^{i)  is  the  average  test  A^  for  the  /th  chromosome  over  the 
80  random  partitions  of  the  data  set,  and  are  the 
minimum  and  maximum  /(/)  among  the  n  chromosomes, 
N{i)  is  the  number  of  features  in  the  ith  chromosome,  and  a 
is  a  penalty  factor,  whose  magnitude  is  less  than  l/k,  to 
suppress  chromosomes  with  a  large  number  of  selected  fea¬ 
tures.  The  value  of  the  fitness  function  F(i)  ranges  from  0  to 
1.  The  probability  of  the  ith  chromosome  being  selected  as  a 
parent,  P^ii),  is  proportional  to  its  fitness  function: 

n 

=  F(i),  (11) 

i-  1 

A  random  sampling  based  on  the  probabilities,  P,(/),  will 
allow  chromosomes  with  higher  value  of  fitness  to  be  se¬ 
lected  more  frequently. 

For  every  pair  of  selected  parent  chromosomes,  X,  and 
Xy,  a  random  decision  is  made  to  determine  if  crossover 
should  take  place.  A  uniform  random  number  in  (0,1]  is 
generated.  If  the  random  number  is  greater  than  the 
probability  of  crossover,  then  no  crossover  will  occur;  other¬ 
wise,  a  random  crossover  site  is  selected  on  the  pair  of  chro¬ 
mosomes.  Each  chromosome  is  split  into  two  strings  at  this 
site  and  one  of  the  strings  will  be  exchanged  with  the  corre- 
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spending  string  from  the  other  chromosome.  Crossover  re¬ 
sults  in  two  new  chromosomes  of  the  same  length. 

After  crossover,  another  chance  of  introducing  new  fea¬ 
tures  is  obtained  by  mutation.  Mutation  is  applied  to  each 
gene  on  every  chromosome.  For  each  bit,  a  uniform  random 
number  in  (0,1]  is  generated.  If  the  random  number  is  greater 
than  Pjn »  the  probability  of  mutation,  then  no  mutation  will 
occur;  otherwise,  the  bit  is  complemented.  The  processes  of 
parent  selection,  crossover,  and  mutation  result  in  a  new  gen¬ 
eration  of  n  chromosomes,  Xj  ,  which  will  again  be 

evaluated  with  the  80  training  and  test  set  partitions  as  de¬ 
scribed  above.  The  chromosomes  are  allowed  to  evolve  over 
a  preselected  number  of  generations.  The  best  subset  of  fea¬ 
tures  is  chosen  to  be  the  chromosome  that  provides  the  high¬ 
est  average  during  the  evolution  process. 

In  this  study,  500  chromosomes  were  used  in  the  popula¬ 
tion.  Each  chromosome  has  281  gene  locations.  was 
chosen  to  be  0.01  so  that  each  chromosome  started  with  two 
to  three  features  on  the  average.  We  varied  P^  from  0.7  to 
0.9,  P^  from  0.001  to  0.005,  and  a  from  0  to  0.001.  These 
ranges  of  parameters  were  chosen  based  on  our  previous  ex¬ 
perience  with  other  feature  selection  problems  using 

2,  Stepwise  linear  discriminant  analysis 

The  stepwise  linear  discriminant  analysis  (LDA)  is  a  com¬ 
monly  used  method  for  selection  of  useful  feature  variables 
from  a  large  feature  space.  Detailed  descriptions  of  this 
method  can  be  found  in  the  literature.^^  The  procedure  is 
briefly  outlined  below.  The  stepwise  LDA  uses  a  forward 
selection  and  backward  removal  strategy.  When  a  feature  is 
entered  into  or  removed  from  the  model,  its  effect  on  the 
separation  of  the  two  classes  can  be  analyzed  by  several 
criteria.  We  use  the  Wilks’  lambda  criterion  which  mini¬ 
mizes  the  ratio  of  the  within-group  sum  of  squares  to  the 
total  sum  of  squares  of  the  two  class  distributions;  the  sig¬ 
nificance  of  the  change  in  the  Wilks’  lambda  is  estimated  by 
F-statistics.  In  the  forward  selection  step,  the  features  are 
entered  one  at  a  time.  The  feature  variable  that  causes  the 
most  significant  change  in  the  Wilks’  lambda  will  be  in¬ 
cluded  in  the  feature  set  if  its  F  value  is  greater  than  the 
F-to-enter  (Fi^)  threshold.  In  the  feature  removal  step,  the 
features  already  in  the  model  are  eliminated  one  at  a  time. 
The  feature  variable  that  causes  the  least  significant  change 
in  the  Wilks’  lambda  will  be  excluded  from  the  feature  set  if 
its  F  value  is  below  the  F-to-remove  (Pout)  threshold.  The 
stepwise  procedure  terminates  when  the  F  values  for  all  fea¬ 
tures  not  in  the  model  are  smaller  than  the  Fjn  threshold  and 
the  F  values  for  all  features  in  the  model  are  greater  than  the 
Fout  threshold.  The  number  of  selected  features  will  decrease 
if  either  the  F^n  threshold  or  the  Fout  threshold  is  increased. 
Therefore,  the  number  of  features  to  be  selected  can  be  ad¬ 
justed  by  varying  the  Fj^  and  Fout  values. 

E.  Classifier 

The  training  and  testing  procedure  described  above  was 
used  for  the  purpose  of  feature  selection  only.  After  the  best 
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subset  of  features  as  determined  by  either  the  GA  or  the 
stepwise  LDA  procedure  was  found,  we  performed  the  clas¬ 
sification  as  follows. 

The  linear  discriminant  analysis^^  procedure  in  the  SPSS 
software  package^^  was  used  to  classify  the  malignant  and 
benign  microcalcification  clusters.  We  used  a  cross- 
validation  resampling  scheme  for  training  and  testing  the 
classifier.  The  data  set  of  145  samples  was  randomly  parti¬ 
tioned  into  a  training  set  and  a  test  set  by  an  approximately 
3:1  ratio.  The  partitioning  was  constrained  so  that  ROIs  from 
the  same  patient  were  always  grouped  into  the  same  set.  The 
training  set  was  used  to  determine  the  coefficients  (or 
weights)  of  the  feature  variables  in  the  linear  discriminant 
function.  The  performance  of  the  trained  classifier  was 
evaluated  with  the  test  set.  In  order  to  reduce  the  effect  of 
case  selection,  the  random  partitioning  was  performed  50 
times.  The  results  were  then  averaged  over  the  50  partitions. 

The  classification  accuracy  of  the  LDA  was  evaluated  by 
ROC  methodology.  The  output  discriminant  score  from  the 
LDA  classifier  was  used  as  the  decision  variable  in  the  ROC 
analysis.  The  labroc  program,^^  which  assumes  binormal 
distributions  of  the  decision  variable  for  the  two  classes  and 
fits  an  ROC  curve  to  the  classifier  output  based  on 
maximum-likelihood  estimation,  was  used  to  estimate  the 
ROC  curve  of  the  classifier.  The  ROC  curve  represents  the 
relationship  between  the  true-positive  fraction  (TPF)  and  the 
false-positive  fraction  (FPF)  as  the  decision  threshold  varies. 
The  area  under  the  ROC  curve  and  the  standard  deviation  of 
the  were  provided  by  the  LABROC  program  for  each  par¬ 
tition  of  training  and  test  sets.  The  average  performance  of 
the  classifier  was  estimated  as  the  average  of  the  50  test  A^ 
values  from  the  50  random  partitions. 

To  obtain  a  single  distribution  of  the  discriminant  scores 
for  the  test  samples,  we  performed  a  leave-one-case-out  re¬ 
sampling  scheme  for  training  and  testing  the  classifier.  In 
this  scheme,  one  of  the  78  cases  was  left  out  at  a  time  and  the 
clusters  from  the  other  77  cases  were  used  for  formulation  of 
the  linear  discriminant  function.  The  resulting  LDA  classifier 
was  used  to  classify  the  clusters  from  the  left-out  case.  The 
procedure  was  performed  78  times  so  that  every  case  was  left 
out  once  to  be  the  test  case.  The  test  discriminant  scores  from 
all  the  clusters  were  accumulated  in  a  distribution  which  was 
then  analyzed  by  the  LABROC  program.  Using  the  distribu¬ 
tions  of  discriminant  scores  for  the  test  samples  from  the 
leave-one-case-out  resampling  scheme,  the  clabroc  pro¬ 
gram  could  be  used  to  test  the  statistical  significance  of  the 
differences  between  ROC  curves'^®  obtained  from  different 
conditions.  The  two-tailed  p  value  for  the  difference  in  the 
areas  under  the  ROC  curves  was  estimated. 


III.  RESULTS 

The  variations  of  best  feature  set  size  and  classifier  per¬ 
formance  in  terms  of  A^  with  the  GA  parameters  were  tabu¬ 
lated  in  Table  II(a)-(c)  for  the  morphological,  the  texture, 
and  the  combined  feature  spaces,  respectively.  The  number 
of  generations  that  the  chromosomes  evolved  was  fixed  at  75 
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Table  II.  Dependence  of  feature  selection  and  classifier  performance  on  GA 
parameters:  (a)  morphological  feature  space,  (b)  texture  feature  space,  and 
(c)  combined  feature  space.  The  number  of  generations  that  the  GA  evolved 
was  fixed  at  75.  The  best  result  for  each  feature  space  is  identified  with  an 
asterisk. 


Pc 

Pm 

a 

(a) 

No.  of  features 

Aj  (Training) 

A,  (Test) 

0.7 

0.001 

0 

6 

0.84 

0.79 

0.8 

3 

0.77 

0.76 

0.9 

4 

0.80 

0.77 

0.7 

0.003 

7 

0.82 

0.78 

0.8 

6 

0.82 

0.79 

0.9 

6 

0.84 

0.79 

0.7 

0.001 

0.0005 

3 

0.77 

0.76 

0.8 

4 

0.80 

o.n 

0.9 

3 

0.77 

0.76 

0.7 

0.003 

6 

0.84 

0.79* 

0.8 

6 

0.84 

0.79 

0.9 

6 

0.82 

0.79 

0.7 

0.001 

0.0010 

3 

0.77 

0.76 

0.8 

4 

0.80 

0.77 

0.9 

3 

0.77 

0.76 

0.7 

0.003 

6 

0.84 

0.79 

0.8 

7 

0.84 

0.79 

0.9 

4 

0.80 

0.77 

(b) 

Pc 

p^ 

a 

No.  of  features 

A  2  (Training) 

A,  (Test) 

0.7 

0.001 

0 

7 

0.87 

0.82 

0.8 

8 

0.88 

0.84 

0.9 

8 

0.88 

0.84 

0.7 

0.003 

17 

0.91 

0.82 

0.8 

9 

0.88 

0.79 

0.9 

10 

0.88 

0.79 

0.7 

0.001 

0.0005 

9 

0.88 

0.85* 

0.8 

7 

0.86 

0.82 

0.9 

8 

0.87 

0.84 

0.7 

0.003 

13 

0.90 

0.81 

0.8 

10 

0.87 

0.81 

0.9 

12 

0.88 

0.81 

0.7 

0.001 

0.0010 

7 

0.87 

0.83 

0.8 

9 

0.88 

0.83 

0.9 

8 

0.88 

0.83 

0.7 

0.003 

10 

0.88 

0.83 

0.8 

21 

0.94 

0.82 

0.9 

12 

0.88 

0.80 

(c) 

Pc 

Pm 

a 

No.  of  features 

A  2  (Training) 

A  2  (Test) 

0.7 

0.001 

0 

13 

0.93 

0.88 

0.8 

12 

0.92 

0.88 

0.9 

12 

0.92 

0.89 

0.7 

0.003 

12 

0.91 

0.86 

0.8 

16 

0.94 

0.88 

0.9 

17 

0.95 

0.88 

0.7 

0.001 

0.0003 

12 

0.92 

0.87 

0.8 

12 

0.92 

0.86 

0.9 

12 

0.93 

0.88 

0.7 

0.003 

13 

0.93 

0.87 

0.8 

13 

0.93 

0.88 

0.9 

12 

0.94 

0.89* 

0.7 

0.005 

12 

0.89 

0.80 

0.7 

0.001 

0.0010 

n 

0.92 

0.87 

0.8 

10 

0.91 

0.87 

0.9 

11 

0.91 

0.86 

0.7 

0.003 

10 

0.91 

0.86 

0.8 

14 

0.93 

0.87 

0.9 

13 

0.92 

0.87 

0.7 

0.005 

11 

0.89 

0.81 

0.8 

12 

0.88 

0.82 

0.9 

12 

0.89 

0.81 

2014 

Table  III.  Dependence  of  feature  selection  and  classifier  performance  on 
Fout  and  thresholds  using  stepwise  linear  discriminant  analysis:  (a)  mor¬ 
phological  feature  space,  (b)  texture  feature  space,  and  (c)  combined  feature 
space.  The  best  result  for  each  feature  space  is  identified  with  an  asterisk. 
When  the  test  is  comparable,  the  feature  set  with  fewer  number  of  fea¬ 
tures  is  considered  to  be  better. 


^out 

^•in 

(a) 

No.  of  features 

A^  (Training) 

Aj  (Test) 

2.7 

3.8 

2 

0.76 

0.76 

1.7 

2.8 

4 

0.79 

0.76 

1.7 

1.8 

6 

0.83 

0.79* 

1.0 

1.4 

1.0 

1.2 

7 

0.84 

0.79 

0.8 

1.0 

9 

0.85 

0.79 

0.6 

0.8 

0.4 

0.6 

10 

0.85 

0.79 

0.2 

0.4 

12 

0.86 

0.78 

0.1 

0.2 

(b) 

P  out 

fta 

No.  of  features 

A^  (Training) 

A^  (Test) 

2.7 

3.8 

4 

0.82 

0.80 

1.7 

2.8 

1.0 

1.4 

8 

0.88 

0.83 

1.0 

1.2 

10 

0.89 

0.82 

0.8 

1.0 

11 

0.89 

0.83 

0.6 

0.8 

14 

0.91 

0.85* 

0.4 

0.6 

17 

0.92 

0.84 

0.2 

0.4 

18 

0.92 

0.81 

0.1 

0.2 

16 

0.90 

0.80 

(c) 

^OUl 

Pi. 

No.  of  features 

A^  (Training) 

Aj  (Test) 

3.0 

3.2 

6 

0.84 

0.80 

2.9 

3.2 

2.8 

3.1 

2.0 

3.1 

3.0 

3.1 

10 

0.88 

0.83 

2.9 

3.0 

2.7 

2.8 

2.0 

2.3 

11 

0.90 

0.86 

2.0 

2.2 

1.9 

2.0 

1.7 

1.8 

1.3 

1.5 

14 

0.92 

0.86 

I.O 

1.2 

19 

0.95 

0.86 

1.0 

I.l 

23 

0.96 

0.87* 

0.8 

1.2 

28 

0.97 

0.86 

in  these  tables.  The  training  and  test  values  were  obtained 
from  averaging  results  of  the  50  partitions  of  the  data  sets 
using  the  selected  feature  sets. 

The  results  of  feature  selection  using  the  stepwise  LDA 
procedure  with  a  range  of  and  thresholds  were  tabu¬ 
lated  in  Table  III(a)“(c).  The  thresholds  were  varied  so  that 
the  number  of  selected  features  varied  over  a  wide  range. 
Often  different  choices  of  and  Fq^  values  could  result  in 
the  same  selected  feature  set  as  shown  in  the  tables  by  the 
number  of  features  in  the  set.  The  average  values  obtained 
from  the  50  partitions  of  the  data  set  using  the  selected  fea¬ 
ture  sets  were  listed.  The  best  feature  sets  selected  in  the 
different  feature  spaces  are  shown  in  Table  IV. 
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Table  IV.  The  best  feature  sets  selected  by  the  GA  and  stepwise  LDA  methods  (indicated  by  asterisk  in  Tables  II  and  III)  in  the  three  feature  spaces.  The 
number  of  generations  for  chromosome  evolution  in  the  GA  algorithm  to  reach  the  selected  feature  sets  is  listed.  The  abbreviations  for  the  texture  features  are: 
correlation  (CORE),  energy  (ENER),  entropy  (ENTR),  difference  average  (DFAV),  difference  entropy  (DFEN),  difference  variance  (DFVR),  inertia  (INER), 
inverse  difference  moment  (INVD),  information  measure  of  correlation  1  (ICOl),  information  measure  of  correlation  2  (IC02),  sum  average  (SMAV),  sum 
entropy  (SMEN),  sum  variance  (SMVR).  After  an  abbreviation,  the  letter  *‘A’*  indicates  diagonal  features  and  the  number  indicates  the  pixel  distance.  The 
abbreviations  for  the  morphological  features  can  be  found  in  Table  I. 


GA 

Stepwise  LDA 

Morphological 

Texture 

Combined 

generation  39 

generation  64 

generation  169 

Morphological 

Texture 

Combined 

CMVD 

DFAVA  8 

DFAVA_4 

AVMD 

DFAV_12 

CORE_40 

CVMR 

DFEN  16 

DFEN  28 

CVMD 

DFEN_4 

COREA_16 

CVSA 

DFVRA  24 

DFVRA_36 

CVMR 

DFEN_8 

COREA_40 

MXMR 

DFVR  24 

DFVR  12 

CVSA 

DFENA_12 

DFAVA^S 

MXSA 

DFVR_4 

DFVR^20 

MXMR 

DFENA_24 

DFEN_4 

SDMD 

DFVR_8 

ICOlA_20 

MXSA 

DFVR_24 

DFEN_8 

IC01A_12 

IC01A„32 

DFVR_40 

DFENA^36 

IC02A^28 

SMEN_16 

IC01_16 

DFVR_20 

ICO2_40 

SMEN_36 

IC01A^8 

IC01A_28 

AVAR 

ICO2_40 

IC02_24 

CVMD 

INER_8 

IC02_36 

CVSA 

INVD_16 

INER_12 

MXEC 

INVD_4 

INERA_16 

NUMS 

INVDA_8 

INVDA_36 

SDMD 

SMEN_40 

SMENA_4 

AVAR 

CVMD 

CVSA 

MXAR 

MXEC 

NUMS 

SDMD 


Table  V  compares  the  training  and  test  values  from  the 
best  feature  set  in  each  feature  space  for  the  two  feature 
selection  methods.  The  GA  parameters  that  selected  the  fea- 
ture  set  with  best  classification  performance  in  each  feature 
space  after  75  generations  (Table  II)  were  used  to  run  the  GA 
again  for  500  generations.  The  values  obtained  with  the 
best  GA  selected  feature  sets  after  75  generations  are  listed 
together  with  those  obtained  after  500  generations.  The  A^ 


values  obtained  with  the  leave-one-case-out  scheme  are  also 
shown  in  Table  V.  The  differences  between  the  correspond¬ 
ing  A  2  values  from  the  two  resampling  schemes  are  within 
0.01.  The  two  feature  selection  methods  provided  feature 
sets  that  had  similar  test  A^  values  in  the  morphological  and 
texture  feature  spaces.  In  the  combined  feature  space,  there 
was  a  slight  improvement  in  the  test  A^  value  obtained  with 
the  GA  selected  features.  Although  the  difference  in  the  A  ^ 


Table  V.  Classification  accuracy  of  linear  discriminant  classifier  in  the  different  feature  spaces  using  feature  sets  selected  by  the  GA  and  the  stepwise  LDA 
procedure. 

Training 

Text  A  2 

Feature  selection 

Morphological 

Texture 

Combined 

Morphological 

Texture 

Combined 

Cross-validation 

GA 

(75  generations) 

GA 

(500  generations) 
Stepwise  LDA 

0.84±0.04 

0.84+0.04 

0.83±0.04 

0.88±0.03 

0.88  ±0.03 

0.91  ±0.03 

0.94±0.02 

0.96  ±0.02 

0.96±0.02 

0.79+0.07 

0.79±0.07 

0.79±0.07 

0.85  ±0.07 

0.85±0.07 

0.85  ±0.06 

0.89±0.05 

0.90+0.05 

0.87±0.06 

Leave-one-case-out 

GA 

(75  generations) 

GA 

(500  generations) 
Stepwise  LDA 

0.83  ±0.03 

0.83  ±0.03 

0.83  ±0.03 

0.88±0.03 

0.88±0.03 

0.91  ±0.02 

0.94±0.02 

0.95  ±0.02 

0.96  ±0.02 

0.79±0.04 

0.79±0.04 

0.79  ±0.04 

0.84±0.03 

0.84±0.03 

0.85  ±0.03 

0.89±0.03 

0.89±0.03 

0.87±0.03 
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Fig.  5,  Comparison  of  RCX^  curves  of  the  LDA  classiher  performance  using 
the  best  GA  selected  feature  sets  in  the  three  feature  spaces.  In  addition,  the 
ROC  curve  obtained  from  the  best  feature  set  selected  by  the  stepwise  LDA 
procedure  in  the  combined  feature  space  is  shown.  The  classification  was 
performed  with  a  leave-one-case-out  resampling  scheme. 


values  from  the  leave-one-case-out  scheme  between  the  two 
feature  selection  methods  did  not  achieve  statistical  signifi¬ 
cance  {/7  =  0.2),  as  estimated  by  CLABROC,  the  differences  in 
the  paired  values  from  the  50  partitions  demonstrated  a 
consistent  trend  (40  out  of  50  partitions)  that  the  from  the 
GA  selected  features  were  higher  than  those  obtained  by  the 
stepwise  LDA.  This  trend  was  also  observed  in  our  previous 
study  in  which  mass  and  normal  tissue  were  classified.^"^ 
The  ROC  curves  for  the  test  samples  using  the  feature  sets 
selected  by  the  GA  were  plotted  in  Fig.  5.  The  classification 
accuracy  in  the  combined  feature  space  was  significantly 
higher  than  those  in  the  morphological  (p  =  0.002)  or  the 
texture  feature  space  (p  =  0.04)  alone.  The  ROC  curve  using 
the  feature  set  selected  by  the  stepwise  procedure  in  the  com¬ 
bined  feature  space  was  also  plotted  for  comparison.  The 
distribution  of  the  discriminant  scores  for  the  test  samples 
using  the  feature  set  selected  by  the  GA  in  the  combined 
feature  space  is  shown  in  Fig.  6(a).  If  a  decision  threshold  is 
chosen  at  0.3,  29  of  the  82  (35%)  benign  samples  can  be 
correcdy  classified  without  missing  any  malignant  clusters. 
Some  of  the  145  samples  are  different  views  of  the  same 
microcalcification  clusters.  In  clinical  practice,  the  decision 
regarding  a  cluster  is  based  on  information  from  all  views.  If 
it  is  desirable  to  provide  the  radiologist  a  single  relative  ma¬ 
lignancy  rating  for  each  cluster,  two  possible  strategies  may 
be  used  to  merge  the  scores  from  all  views:  the  average  score 
or  the  minimum  score.  The  latter  strategy  corresponds  to  the 
use  of  the  highest  likelihood  of  malignancy  score  for  the 
cluster.  There  were  a  total  of  81  different  clusters  (44  benign 
and  37  malignant)  from  the  78  cases  because  3  of  the  cases 
contained  both  a  benign  and  a  malignant  cluster.  The  distri¬ 
butions  of  the  average  and  the  minimum  discriminant  scores 
of  the  81  clusters  in  the  combined  feature  space  were  plotted 
in  Fig.  6(b)  and  Fig.  6(c),  respectively.  Using  the  average 
scores,  ROC  analysis  provided  test  A^  values  of  0.93  ±0.03 
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(b) 


MINIMUM  DISCRIMINANT  SCORES 

(c) 


Fig.  6.  Distribution  of  the  discriminant  scores  for  the  test  samples  using  the 
best  GA  selected  feature  set  in  the  combined  texture  and  morphological 
feature  space,  (a)  Classification  by  samples  from  each  film,  (b)  classification 
by  cluster  using  the  average  scores,  (c)  classification  by  cluster  using  the 
minimum  scores. 


and  0.89  ±0.04,  respectively,  for  the  GA  selected  and  step¬ 
wise  LDA  selected  feature  sets.  Using  the  minimum  scores, 
the  test  Aj  values  were  0.90  ±0.03  and  0.85  ±0.04,  respec¬ 
tively.  The  difference  between  the  values  from  the  two 
feature  selection  methods  did  not  achieve  statistical  signifi¬ 
cance  in  either  case  {p  =  0.07  and  p  =  0.09,  respectively).  If  a 
decision  threshold  is  chosen  at  an  average  score  of  0.2, 22  of 
the  44  (50%)  benign  clusters  can  be  correctly  identified  with 
100%  correct  classification  of  the  malignant  clusters.  If  a 
decision  threshold  is  set  at  a  minimum  score  of  0.2, 14  of  the 
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44  (32%)  benign  clusters  can  be  identified  at  100%  sensitiv¬ 
ity. 

iV.  DISCUSSION 

The  Fischer’s  linear  discriminant  is  the  optimal  classifier 
if  the  class  distributions  are  multivariate  normal  with  equal 
covariance  matrices.^^  Even  if  these  conditions  are  not  satis¬ 
fied,  as  in  most  classification  tasks,  the  LDA  may  still  be  a 
preferred  choice  when  the  number  of  available  training 
samples  is  small.  Our  previous  investigation'^^’'^^  of  the  de¬ 
pendence  of  classifier  performance  on  design  sample  size 
indicated  that,  in  general,  the  training  performance  (resubsti¬ 
tution)  of  a  classifier  is  positively  biased  whereas  the  test 
performance  (hold-out)  is  negatively  biased  by  the  sample 
size.  The  magnitudes  of  the  biases  increase  when  the  dimen¬ 
sionality  of  the  input  feature  space  or  the  complexity  of  the 
classifier  increases,  or  when  the  design  sample  size  de¬ 
creases.  Therefore,  the  test  performance  of  a  linear  classifier 
is  generally  better  than  that  of  a  more  complex  classifier  such 
as  a  neural  network  or  a  quadratic  classifier  when  the  training 
sample  size  is  small.  The  training  results  should  not  be  used 
for  comparison  of  classifier  performance  because  a  classifier 
can  often  be  overtrained  and  give  a  near-perfect  classification 
on  training  samples  while  the  generalization  to  any  unknown 
test  samples  is  poor.  In  this  study,  we  evaluated  the  effec¬ 
tiveness  of  using  the  morphological  and  the  texture  features 
extracted  from  mammograms  for  classification  of  a  microcal¬ 
cification  cluster.  Although  we  expanded  the  data  set  from 
our  previous  study,  the  current  data  set  was  still  relatively 
small.  We  therefore  chose  to  use  a  linear  discriminant  clas¬ 
sifier  for  this  classification  task.  Stepwise  feature  selection  or 
a  GA  was  used  to  reduce  the  dimensionality  of  the  feature 
space. 

In  the  morphological  feature  space,  the  features  related  to 
three  characteristics,  mean  density,  the  moment  ratio,  and  the 
signal  area,  were  chosen  most  often.  The  features  related  to 
axis  ratio,  eccentricity,  and  the  number  of  microcalcifications 
in  a  cluster  were  chosen  only  when  they  were  combined  with 
texture  features.  These  results  indicate  the  usefulness  of  clas¬ 
sification  in  multi-dimensional  feature  spaces.  Some  features 
that  are  not  useful  by  themselves  can  become  effective  fea¬ 
tures  when  they  are  combined  with  other  features.  The  re¬ 
sults  also  indicate  that  all  six  characteristics  of  the  microcal¬ 
cifications  designed  for  this  task  have  some  discriminatory 
power  to  distinguish  malignant  and  benign  microcalcifica¬ 
tions.  The  morphological  features  are  not  as  effective  as  the 
texture  features.  This  is  evident  from  the  smaller  values  in 
the  morphological  feature  space.  However,  when  the  mor¬ 
phological  feature  space  is  combined  with  the  texture  feature 
space,  the  resulting  feature  set  selected  from  the  combined 
feature  space  can  significantly  improve  the  classification  ac¬ 
curacy,  in  comparison  with  those  from  the  individual  feature 
spaces. 

The  SGLD  texture  features  characterize  the  shape  of  the 
SGLD  matrix  and  generally  contain  information  about  the 
image  properties  such  as  homogeneity,  contrast,  the  presence 
of  organized  structures,  as  well  as  the  complexity  and  gray- 
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level  transitions  within  the  image."^^  As  an  example,  the  en¬ 
tropy  feature  measures  the  uniformity  of  the  SGLD  matrix. 
The  entropy  value  is  maximum  when  all  the  matrix  elements 
are  equal.  The  entropy  value  is  small  when  large  matrix  el¬ 
ements  concentrate  in  a  small  region  of  the  SGLD  matrix 
while  the  other  matrix  elements  are  relatively  small.  There¬ 
fore,  large  entropy  represents  a  large  but  random  variation  of 
pixel  values  in  an  image  without  regular  structures  whereas 
small  entropy  represents  an  image  with  relatively  uniform 
pixel  values  if  the  SGLD  matrix  peaks  along  the  diagonal 
and  an  image  with  regular  texture  patterns  if  it  peaks  off  the 
diagonal.  The  ambiguity  may  be  resolved  when  the  sum  en¬ 
tropy  and  difference  entropy  measures  are  analyzed.  Unlike 
morphological  features,  it  is  difficult,  in  general,  to  find  the 
direct  relationship  between  a  texture  measure  and  the  struc¬ 
tures  seen  on  an  image, and  often  a  combination  of  several 
texture  measures  extracted  at  different  angles  and  pixel  pair 
distances  are  required  to  describe  a  texture  pattern.  It  may 
also  be  noted  that  some  textures  can  only  be  described  by 
second-order  statistics  and  may  not  be  distinguishable  by 
human  eyes.  The  feature  selection  methods  are  used  to  em¬ 
pirically  find  the  combination  of  features  that  can  most  ef¬ 
fectively  distinguish  the  malignant  and  benign  lesions. 

From  Table  IV,  it  can  be  seen  that  many  of  the  features  in 
the  best  feature  sets  selected  by  the  GA  method  and  the 
stepwise  LDA  method  are  similar.  In  the  morphological  fea¬ 
ture  space,  five  of  the  six  selected  features  are  the  same  in 
the  two  feature  sets.  In  the  combined  feature  space,  six  mor¬ 
phological  features  (out  of  six  and  seven  morphological  fea¬ 
tures  in  the  two  sets,  respectively)  are  the  same.  For  the 
texture  features,  there  are  more  variations  in  the  features  se¬ 
lected  by  the  two  methods.  However,  the  differences  are 
mainly  in  the  pixel  distances  and  the  directions  of  the  fea¬ 
tures,  while  the  major  types  of  the  texture  features  are  simi¬ 
lar.  For  example,  four  types  of  texture  features,  energy,  en¬ 
tropy,  sum  average,  and  sum  variance  were  not  selected  in 
either  the  texture  or  the  combined  feature  space  by  both 
methods.  Another  four  types  of  texture  features,  difference 
average,  difference  entropy,  difference  variance,  and  infor¬ 
mation  measure  of  correlation  1  were  chosen  in  each  case, 
and  information  measure  of  correlation  2  was  chosen  in  three 
of  the  four  cases.  Inertia  and  inverse  difference  moment  were 
selected  by  the  stepwise  LDA  method  in  both  the  texture  and 
the  combined  feature  spaces.  Sum  entropy  was  selected  by 
both  methods  in  the  combined  feature  space.  These  results 
indicate  that  some  features  are  more  effective  than  the  others 
for  distinguishing  benign  and  malignant  microcalcifications. 
The  pixel  distance  and  the  direction  of  the  texture  features 
may  be  considered  to  be  higher  order  effects  that  have  less 
influence  on  the  discriminatory  ability  of  a  given  type  of 
texture  measure.  The  smaller  differences  in  their  discrimina¬ 
tory  ability  would  subject  them  to  greater  variability  of  being 
chosen  in  the  feature  selection  processes.  It  may  also  be 
noted  that  many  of  the  features  are  highly  correlated.  The 
correlated  features  can  be  interchanged  in  a  classifier  model 
without  a  strong  effect  on  its  performance. 

The  GA  solves  an  optimization  problem  based  on  a  search 
guided  by  the  fitness  function.  Ideally,  the  values  for  the  » 
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Pc ,  and  a  parameters  chosen  in  the  GA  only  affect  the  con¬ 
vergence  rate  but  will  eventually  evolve  to  the  same  global 
maximum.  However,  when  the  dimensionality  of  the  feature 
space  is  very  large  and  the  design  samples  are  sparse,  the  GA 
often  reaches  local  maxima  corresponding  to  different  fea¬ 
ture  sets,  as  can  be  seen  in  Table  IL  Similarly,  the  stepwise 
feature  selection  may  reach  a  different  local  maximum  and 
choose  a  feature  set  different  from  those  chosen  by  the  GA. 
The  different  feature  sets  may  provide  different  or  similar 
performance.  The  latter  is  often  a  result  of  the  correlation 
among  the  features,  as  described  above. 

For  the  linear  discriminant  classifier,  the  stepwise  LDA 
procedure  can  select  near-optimal  features  for  the  classifica¬ 
tion  task.  We  have  shown  that  the  GA  could  select  a  feature 
set  comparable  to  or  slightly  better  than  that  selected  by  the 
stepwise  LDA.  The  number  of  generations  that  the  GA  had 
to  evolve  to  reach  the  best  selection  increased  with  the  di¬ 
mensionality  of  the  feature  space  as  expected.  However, 
even  in  a  281 -dimensional  feature  space,  it  only  took  169 
generations  to  find  a  better  feature  set  than  that  selected  by 
stepwise  LDA.  Further  search  up  to  500  generations  did  not 
find  other  feature  combinations  with  better  performance.  Al¬ 
though  the  difference  in  A.  did  not  achieve  statistical  signifi¬ 
cance,  probably  due  to  the  large  standard  deviation  in 
when  the  number  of  case  samples  in  the  ROC  analysis  was 
small,  the  improvements  in  A.  in  this  and  our  previous 
studies^'*  indicate  that  the  GA  is  a  useful  feature  selection 
method  for  classifier  design.  One  of  the  advantages  of  GA- 
based  feature  selection  is  that  it  can  search  for  near-optimal 
feature  sets  for  any  types  of  linear  or  nonlinear  classifiers, 
whereas  the  stepwise  LDA  procedure  is  more  tailored  to  lin¬ 
ear  discriminant  classifiers.  Furthermore,  the  fitness  function 
in  the  GA  can  be  designed  such  that  features  with  specific 
characteristics  are  favored.  One  of  the  applications  in  this 
direction  is  to  select  features  to  design  a  classifier  with  high 
sensitivity  and  high  specificity  for  classification  of  malignant 
and  benign  lesions. Although  the  GA  requires  much 
longer  computation  time  than  the  stepwise  LDA  to  search  for 
the  best  feature  set,  the  flexibility  of  the  GA  makes  it  an 
increasingly  popular  alternative  for  solving  machine  learning 
and  optimization  problems.  Since  feature  selection  is  per¬ 
formed  only  during  training  of  a  classifier,  the  speed  of  a 
trained  classifier  for  processing  test  cases  is  not  affected  by 
the  choice  of  the  feature  selection  method.  Therefore,  the 
longer  computation  time  of  GA  is  not  a  problem  in  practice 
if  the  GA  can  provide  a  better  feature  set  for  a  given  classi¬ 
fication  task. 

V.  CONCLUSIONS 

In  this  study,  we  evaluated  the  effectiveness  of  morpho¬ 
logical  and  texture  features  extracted  from  mammograms  for 
classification  of  malignant  and  benign  microcalcification 
clusters.  We  also  compared  a  GA-based  feature  selection 
method  and  a  stepwise  feature  selection  procedure  based  on 
linear  discriminant  analysis.  It  was  found  that  the  best  fea¬ 
ture  set  was  selected  from  the  combined  morphological  and 
texture  feature  space  by  the  GA-based  method.  A  linear  dis- 
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criminant  classifier  using  the  best  feature  set  and  a  properly 
chosen  decision  threshold  could  correctly  identify  35%  of 
the  benign  clusters  without  missing  any  malignant  clusters.  If 
the  average  discriminant  score  from  all  views  of  the  same 
cluster  was  used  for  classification,  the  accuracy  improved  to 
50%  specificity  at  100%  sensitivity.  Alternatively,  if  the 
minimum  discriminant  score  from  all  views  of  the  same  clus¬ 
ter  was  used,  the  accuracy  would  be  32%  specificity  at  100% 
sensitivity.  This  information  may  be  used  to  reduce  unnec¬ 
essary  biopsies,  thereby  improving  the  positive  predictive 
value  of  mammography.  Although  these  results  were  ob¬ 
tained  with  a  relatively  small  data  set,  they  demonstrate  the 
potential  of  using  CAD  techniques  to  analyze  mammograms 
and  to  assist  radiologists  in  making  diagnostic  decisions. 
Further  studies  will  be  conducted  to  evaluate  the  generaliz- 
ability  of  our  approach  in  large  data  sets. 
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Computerized  Radiographic  Mass  Detection — Part  I: 
Lesion  Site  Selection  by  Morphological  Enhancement 

and  Contextual  Segmentation 

Huai  Li,  Yue  Wang,  K.  J.  Ray  Liu*,  Shih-Chung  B.  Lo,  and  Matthew  T.  Freedman 


Abstract — This  paper  presents  a  statistical  model  supported 
approach  for  enhanced  segmentation  and  extraction  of  suspicious 
mass  areas  from  mammographic  images.  With  an  appropriate 
statistical  description  of  various  discriminate  characteristics 
of  both  true  and  false  candidates  from  the  localized  areas,  an 
improved  mass  detection  may  be  achieved  in  computer-assisted 
diagnosis  (CAD).  In  this  study,  one  type  of  morphological  oper¬ 
ation  is  derived  to  enhance  disease  patterns  of  suspected  masses 
by  cleaning  up  unrelated  background  clutters,  and  a  model-based 
image  segmentation  is  performed  to  localize  the  suspected  mass 
areas  using  stochastic  relaxation  labeling  scheme.  We  discuss  the 
importance  of  model  selection  v^^hen  a  finite  generalized  Gaussian 
mixture  is  employed,  and  use  the  information  theoretic  criteria  to 
determine  the  optimal  model  structure  and  parameters.  Examples 
are  presented  to  shove  the  effectiveness  of  the  proposed  methods 
on  mass  lesion  enhancement  and  segmentation  when  applied  to 
mammographical  images.  Experimental  results  demonstrate  that 
the  proposed  method  achieves  a  very  satisfactory  performance  as 
a  preprocessing  procedure  for  mass  detection  in  CAD. 

Index  Terms — Finite  mixture,  image  enhancement,  image  seg¬ 
mentation,  information  criterion,  morphological  filtering,  relax¬ 
ation  labeling. 

1.  Introduction 

IN  RECENT  years,  several  computer-assisted  diagnosis 
(CAD)  schemes  for  mass  detection  and  classification 
have  been  developed  [1]-[13].  Though  it  may  be  difficult  to 
compare  the  relative  performance  of  these  methods,  because 
the  reported  performance  strongly  depends  on  the  degree  of 
subtlety  of  masses  in  the  selected  database,  accurate  selection 
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of  suspected  masses  is  considered  a  critical  and  first  step  due 
to  the  variability  of  normal  breast  tissue  and  the  lower  contrast 
and  ill-defined  margins  of  masses  [3],  [6],  and  since  no  subtle 
masses  should  be  missed  before  any  further  analysis. 

A  number  of  image  processing  techniques  have  been  pro¬ 
posed  to  perform  suspicious  mass  site  selection.  Kobatake  etal 
[1]  proposed  using  a  iris  filter  to  detect  tumors  as  suspicious  re¬ 
gions  with  very  weak  contrast  to  their  background.  Sameti  etal. 
[7]  used  fuzzy  sets  to  partition  the  mammographic  image  data. 
Lau  and  Yin  et  al  independently  proposed  using  bilateral-sub¬ 
traction  to  determine  possible  mass  locations  [9],  [13].  Some 
other  investigators  proposed  using  pixel-based  feature  segmen¬ 
tation  of  spiculated  masses  [4],  [8].  Kegelmeyer  has  reported 
promising  results  for  detecting  spiculated  tumors  based  on  local 
edge  characteristics  and  Laws  texture  features  [8].  Karssemeijer 
et  al.  [4]  proposed  to  identify  stellate  distortions  by  using  the 
orientation  map  of  line-like  structures.  Recently,  Petrick  et  al 
[6]  proposed  a  two-stage  adaptive  density-weighted  contrast  en¬ 
hancement  filtering  technique  along  with  edge  detection  and 
morphological  feature  classification  for  automatic  segmentation 
of  potential  masses,  Kupinski  and  Giger  [3]  presented  a  radial 
gradient  index-based  algorithm  and  a  probabilistic  algorithm  for 
seeded  lesion  segmentation. 

Nevertheless,  to  our  best  knowledge,  few  work  has  been  ded¬ 
icated  to  improve  the  task  of  lesion  site  selection  although  it  is 
indeed  a  very  crucial  step  in  CAD.  Especially,  few  studies  have 
used  and  justified  model-based  image  processing  techniques  for 
unsupervised  lesion  site  selection  [11].  Zwiggelaar  et  a/.devel- 
oped  a  statistical  model  to  describe  and  detect  the  abnormal  pat¬ 
tern  of  linear  structures  of  spiculated  lesions  [2].  In  their  work, 
the  probability  density  function  of  the  observation  vectors  for 
each  class  is  assumed  to  be  normal,  we  have  experienced  that 
the  “normaF’  distribution  for  each  class  is  nor  true,  Li  et  al  pro¬ 
posed  using  a  Markov  random  field  model  to  extract  suspicious 
masses  for  mass  detection  [11].  In  their  study,  most  of  model 
parameters  were  chosen  empirically,  and  the  mammogram  was 
segmented  into  three  regions  (background,  fat,  and  parenchymal 
or  tumors). 

Stochastic  model-based  image  segmentation  is  a  technique 
for  partitioning  an  image  into  distinctive  meaningful  regions 
based  on  the  statistical  properties  of  both  gray  level  and  context 
images.  A  good  segmentation  result  would  depend  on  suitable 
model  selection  for  a  specific  image  modality  [16],  [17]  where 
model  selection  refers  to  the  determination  of  both  the  number 
of  image  regions  and  the  local  statistical  distributions  of  each 
region.  Furthermore,  a  segmentation  result  would  be  improved 
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Fig.  L  Major  components  in  CAD. 

with  preenhanced  pattern  of  interest  being  segmented.  The  only 
assumption  for  suspected  mass  site  selection  is  that  suspected 
mass  areas  should  be  brighter  than  the  surrounding  breast  tissues 
which  is  valid  for  most  of  the  real  cases.  When  some  masses 
lie  either  within  an  inhomogeneous  pattern  of  fibroglandular 
tissue  or  are  partially  or  completely  surrounded  by  fibroglan- 
dular  tissue,  enhancement  of  mass-related  signals  is  important. 

Fig.  1  shows  a  general  block  diagram  of  CAD  systems.  This 
paper  focuses  on  “image  processing”  block,  to  just  automati¬ 
cally  pick  up  all  possible  lesion  sites.  We  aim  on  two  essential 
issues  in  the  stochastic  model-based  image  segmentation:  en¬ 
hancement  and  model  selection.  Based  on  the  differential  geo¬ 
metric  characteristics  of  masses  against  the  background  tissues, 
we  propose  one  type  of  morphological  operation  to  enhance  the 
mass  patterns  on  mammograms.  Then  we  employ  a  finite  gen¬ 
eralized  Gaussian  mixture  (FGGM)  distribution  to  model  the 
histogram  of  the  mammograms  where  the  statistical  properties 
of  the  pixel  images  are  largely  unknown  and  are  to  be  incor¬ 
porated.  We  incorporate  the  EM  algorithm  with  two  informa¬ 
tion  theoretic  criteria  to  determine  the  optimal  number  of  image 
regions  and  the  kernel  shape  in  the  FGGM  model.  Finally,  we 
apply  a  contextual  Bayesian  relaxation  labeling  (CBRL)  tech¬ 
nique  to  perform  the  selection  of  suspected  masses.  The  major 
differences  of  our  work  from  the  previous  work  [  1  ]-[6],  [8]-[  1 3] 
are  as  follows. 

1)  We  present  a  new  algorithm  of  morphological  filtering 
for  image  enhancement  in  which  the  combined  operations 
are  applied  to  the  original  gray  tone  image  and  the  higher 
sensitive  lesion  site  selection  of  the  enhanced  images  are 
observed. 

2)  We  justify  and  pilot  test  the  FGGM  distribution  in  mod¬ 
eling  mammographic  pixel  images  together  with  a  model 
selection  procedure  based  on  the  two  information  theo¬ 
retic  criteria.  This  allows  an  automatic  identification  of 
both  the  number  {K)  and  kernel  shape  (cv)  of  the  distri¬ 
butions  of  tissue  types. 

3)  We  develop  a  new  algorithm  (CBRL)  for  segmenting 
mass  areas  where  the  comparable  results  are  achieved 
as  those  using  Markov  random  field  model-based 
approaches  while  with  much  less  computational  com¬ 
plexity. 

The  presentation  of  this  paper  is  organized  as  follows.  In  Sec¬ 
tion  n,  the  proposed  dual  morphological  operation  enhancement 
technique  is  described  in  detail.  The  theory  and  algorithm  on 


FGGM  modeling,  model  selection,  and  parameter  estimation 
are  presented  in  Section  HI.  This  is  followed  by  a  discussion 
on  the  selection  of  suspicious  masses  using  the  CBRL  approach. 
Evaluation  results  are  given  and  discussed  in  Section  IV.  Finally, 
the  paper  is  concluded  by  Section  V. 

II.  Morphological  Enhancement 

One  of  the  main  difficulties  in  suspicious  mass  segmentation 
is  that  mammographic  masses  are  often  overlapped  with  dense 
breast  tissues.  Therefore,  it  is  necessary  to  remove  bright  back¬ 
ground  caused  by  dense  breast  tissues  while  preserving  the  fea¬ 
tures  and  patterns  related  to  the  masses.  For  this  purpose,  back¬ 
ground  correction  is  an  important  step  for  mass  segmentation. 
We  propose  a  mass  pattern-dependent  background  removal  ap¬ 
proach  using  morphological  operations. 

A.  Morphological  Filtering  Theory 

Morphological  operations  can  be  employed  for  many  image 
processing  purposes,  including  edge  detection,  region  segmen¬ 
tation,  and  image  enhancement.  The  beauty  and  simplicity  of 
mathematical  morphology  approach  come  from  the  fact  that  a 
large  class  of  filters  can  be  represented  as  the  combination  of 
two  simple  operations:  erosion  and  dilation.  Let  Z  denote  the 
set  of  integers  and  f{i,  j)  denote  a  discrete  image  signal,  where 
the  domain  set  is  given  by  {z,  j)  e  Ni  x  iV2,  JV^  x  iV2  C 
and  the  range  set  by  {/}  e  N3,  N^c  Z,A  structuring  element 

is  a  subset  in  Z^  with  a  simple  geometrical  shape  and  size. 
Denote  =  {— 6  :  be.  J9}  as  the  symmetric  set  of  B  and 
as  the  translation  of  B  by  (fi,  ^2)^  where  (^i,  ^2)  6  Z^, 
The  erosion  /  0  B"^  and  dilation  /  0  B"^  can  be  expressed  as 
[19] 

inin  (1) 

tl ,  *2  J 

(/  ©  j)  =  max  (/{<i,  «2)).  (2) 

,t2€o{J 

On  the  other  hand,  opening  f  oB  and  closing  f  •Bslyc  defined 
as  [19] 

(/  o  B){i,  j)  =  ((/  0  B^)  0  B){i,  j)  (3) 

if  •  B){i,  j)  =  ((/  ©  B”)  ©  B)ii,  j).  (4) 

A  gray  value  image  can  be  viewed  as  a  two-dimensional  sur¬ 
face  in  a  three-dimensional  space.  Given  an  image,  the  opening 
operation  removes  the  objects,  which  have  size  smaller  than  the 
structuring  element,  with  positive  intensity.  Thus,  with  the  spec¬ 
ified  structuring  element,  one  can  extract  different  image  con¬ 
texts  by  taking  the  difference  between  the  original  and  opening 
processed  image,  which  is  known  as  “tophat”  operation  [19]. 

R  Morphological  Enhancement  Algorithms 

Based  on  the  properties  of  morphological  filters,  we  designed 
one  type  of  mass  pattern-dependent  enhancement  approaches. 
The  algorithm  is  implemented  by  dual  morphological  tophat  op¬ 
erations  following  by  a  subtraction  which  is  described  as  fol¬ 
lows. 
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Step  1)  The  textures  without  the  pattern  information  of  in¬ 
terest  are  extracted  by  a  tophat  operation 

ri(z,  j)  =  max(0,  [/(i,  j)  -  {f  o  j)])  (5) 

where  f{i,  j)  is  the  original  image,  and  ri(i,  j)  is 
the  residue  image  between  the  original  image  and  the 
opening  of  the  original  image  by  a  specified  struc¬ 
turing  element  Bi.  The  size  of  Bi  should  be  chosen 
smaller  than  the  size  of  masses. 

Step  2)  Let  r2{i,  j)  be  the  mass  pattern  enhanced  image  by 
background  correction,  i.e.,  by  the  second  tophat  op¬ 
eration  on  f{iy  j) 

7-2 (t,  j)  =  max(0,  [/(i,  j)  -  (/  o  B2)ii,  j)])  (6) 

where  B2  is  a  specified  structuring  element  which 
has  a  larger  size  than  masses. 

Step  3)  The  enhanced  image  fi{i,  j)  can  be  derived  as 

fi(i,  j)  =  max(0,  [r2(7,  j)  -  ri(i,  j)]).  (7) 

This  operation  is  called  “dual  morphological  operation.”  It 
can  remove  the  background  noise  and  the  structure  noise  inside 
the  suspected  mass  patterns.  Fig,  2  shows  the  mass  patch  and 
the  enhanced  results  of  each  step  using  the  dual  morphological 
operation.  As  we  can  see  from  Fig.  2,  both  background  correc¬ 
tion  [Fig.  2(c)]  and  dual  morphological  operation  [Fig.  2(d)]  en¬ 
hanced  the  mass  pattern,  but  dual  morphological  operation  re¬ 
moved  more  structural  noise  inside  the  mass  region  which  in 
turn  would  improve  the  mass  segmentation  results. 


m.  Model-Based  Segmentation 
A.  Statistical  Modeling 

Given  a  digital  image  consisting  of  Ni  x  N2  pixels,  assume 
this  image  contains  K  regions.  By  randomly  reordering  all 
pixels  in  the  underlying  probability  space,  one  can  treat  pixel 
labels  as  random  variables  and  introduce  a  prior  probability 
measure  TTfc.  Then  the  FGGM  probability  density  function  (pdf) 
of  gray  level  of  each  pixel  is  given  by  [17] 

K 

p(^i)  =  L]  7  =  1,...,  N1N2, 

k=l 

Xi=0,l,...,L-l  (8) 


where  Xi  is  the  gray  level  of  pixel  i,  and  L  is  the  number  of  gray 
levels.  Pfc(rri)s  are  conditional  region  pdfs  with  the  weighting 
factor  TTfc,  satisfying  'Ku  >  0,  and  J2k=i  tta?  =  L  The  general¬ 
ized  Gaussian  pdf  given  region  k  is  defined  by 


2r(l/a) 


exp[-  \pk{xi  - 


~r(3/a)' 

r(i/a). 


1/2 


a  >  0, 


(9) 


where  fik  is  the  mean,  r(’)  is  the  Gamma  function,  pk  is  a  pa¬ 
rameter  related  to  the  variance  crfc-  It  can  be  shown  that  when 


(c)  (d) 


Fig.  2.  Original  and  enhancement  result  of  the  mass  patch  using 
dual-morphological  operation,  (a)  Original  image  block  /(i,  j).  (b) 
Textures  ri(i,  jf).  (c)  Background  correction  result  T2{i,  j).  (d)  Enhanced 
result  /i(i,  j). 


a  —  2,0,  one  has  the  Gaussian  pdf;  when  a  =  1.0,  one  has  the 
Laplacian  pdf  When  a  >  1,  the  distribution  tends  to  a  uniform 
pdf;  when  a  <1,  the  pdf  becomes  sharp.  Therefore,  the  gener¬ 
alized  Gaussian  model  is  a  suitable  model  to  fit  the  histogram 
distribution  of  those  images  whose  statistical  properties  are  un¬ 
known  since  the  kernel  shape  can  be  controlled  by  selecting  dif¬ 
ferent  a  values. 

The  whole  image  can  be  well  approximated  by  an  in¬ 
dependent  and  identically  distributed  random  field  X.  The 
corresponding  joint  pdf  is 


NxN2  K 

■P(x)  =  ri  T^kPkixi)  (10) 

i=:=l 

where  x  =  [xi,  2:2,  x  G  X.  Pk{^i)  is  given 

in  (9).  Based  on  the  joint  probability  measure  of  pixel  images, 
the  likelihood  function  under  FGGM  modeling  can  be  expressed 
as  £(r)  =  Pri^i)  where  r  :  {K,  a,  Tr^,  ma;,  o^k,  k  = 

1,  . . . ,  K}  denotes  the  model  parameter  set. 

B.  Model  Identification 

With  an  appropriate  system  likelihood  function,  the  objective 
of  model  identification  is  to  estimate  the  model  parameters  by 
maximizing  the  likelihood  function,  or  equivalently  minimizing 
the  relative  entropy  between  the  image  histogram  Px('tt)  and 
the  estimated  pdf  Pr{u),  where  u  is  the  gray  level.  Based  on 
the  FGGM  model,  the  EM  algorithm  is  applied  to  estimate  the 
model  parameters.  The  EM  algorithm  is  an  iterative  technique 
for  maximum-likelihood  (ML)  estimation  [20].  Recently,  it  has 
been  used  in  many  medical  imaging  applications  [15].  Instead 
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of  evaluating  directly  the  value  of  ML,  we  use  the  global  rel¬ 
ative  entropy  (GRE)  between  the  histogram  and  the  estimated 
FGGM  distribution  to  measure  the  performance  of  parameter 
estimation,  given  by 

GRE(p,||p,)  =  ^  p^{u)  log  .  (11) 

Motivated  by  the  same  spirit  of  conventional  EM  algorithm 
for  finite  normal  mixtures  (FNMs),  we  formulated  the  EM  al¬ 
gorithm  to  estimate  the  parameter  values  of  the  FGGM.  The  al¬ 
gorithm  is  summarized  as  follows. 

EM  Algorithm: 

1)  For  O;  —  Ofjjiin?  •  •  •  5  Ofiuax 

•  m  =  0,  given  initialized 

•  E-step:  for  i 

compute  the  probabilistic  membership 


-(m)  _  T^lr^PkiXi) 

^ik  --J- - 

k=l 


(12) 


•  M-step:  fork  =  1,  . . . ,  compute  the  updated 
parameter  estimates 


-  N 1 A  2 

_(m+l)  _  I  (m) 


N1N2  ^ 


r  E  4”’ 

*  o  •  ^ 


= - 1 _  ^  ^(”0 


^2(m+l) 


.  ^  i  Aa 


•  When  |GRE("*>(p,||p,)  -  GRE('"+i)(p.,|lPr)|  <  e 
is  satisfied,  go  to  Step  2  Otherwise,  m  =  m  + 1  and 
go  to  E-Step. 

2)  Compute  GRE,  and  go  to  Step  1. 

3)  Choose  the  optimal  f  which  corresponds  to  the  minimum 
GRE. 


As  we  mentioned  in  Section  I,  the  two  important  parameters 
in  model  selection  are  K  and  a.  Determination  of  the  region 
parameter  K  directly  affects  the  quality  of  the  resulting  model 
parameter  estimation  and  in  turn,  affects  the  result  of  segmen¬ 
tation.  In  this  paper  we  propose  an  approach  to  determine  the 
value  of  K  based  on  two  popular  information  theoretic  criteria 
introduced  by  Akaike  [23]  and  by  Rissanen  [24].  Akaike  pro¬ 
posed  to  select  the  model  that  gives  the  minimum  Akaike  infor¬ 
mation  criterion  (AIC),  defined  by 


AIC(A:)  =  -~2Iog(£(f  2K'  (14) 


Rissanen  addressed  the  problem  from  a  quite  different  point 
of  view.  Rissanen  reformulated  the  problem  explicitly  as  an  in¬ 
formation  coding  problem  in  which  the  best  model  fitness  is 
measured  such  that  it  assigns  high  probabilities  to  the  observed 
data  while  at  the  same  time  the  model  itself  is  not  too  complex  to 
describe  [24].  The  model  is  selected  by  minimizing  the  total  de¬ 
scription  length  defined  by  minimum  description  length  (MDL) 

MDL{K)  =  -log{/:{vML))  +  0.5K'\og{N,N2).  (16) 

Similarly,  the  correct  number  of  the  distinctive  image  regions 
Kq  will  be  estimated  when 

C.  Bayesian  Relaxation  Labeling 

Once  the  FGGM  model  is  given,  a  segmentation  problem  is 
the  assignment  of  labels  to  each  pixel  in  the  image.  A  straight¬ 
forward  way  is  to  label  pixels  into  different  regions  by  maxi¬ 
mizing  the  individual  likelihood  function  Pk(x).  This  approach 
is  called  ML  classifier,  which  is  equivalent  to  a  multiple  thresh¬ 
olding  method.  Usually,  this  method  may  not  achieve  a  good 
performance  since  there  is  lack  of  local  neighborhood  informa¬ 
tion  to  be  included  to  make  a  good  decision.  CBRL  algorithm 
[25]  is  one  of  the  approaches,  which  can  incorporate  the  local 
neighborhood  information  into  labeling  procedure  and  thus  im¬ 
prove  the  segmentation  performance.  In  this  study,  we  devel¬ 
oped  the  CBRL  algorithm  to  perform/refine  pixel  labeling  based 
on  the  localized  FGGM  model,  which  is  defined  as  follows. 

Let  di  be  the  neighborhood  of  pixel  i  with  an  m  x  m  template 
centered  at  pixel  i.  An  indicator  function  is  used  to  represent  the 
local  neighborhood  constraints  Rijih,  Ij)  =/(/,,  Ij),  where  h 
and  Ij  are  labels  of  pixels  i  and  respectively.  Note  that  pairs 
of  labels  are  now  either  compatible  or  incompatible.  Similar  to 
reference  [25],  one  can  compute  the  frequency  of  neighbors  of 
pixel  i  which  has  the  same  label  values  k  as  at  pixel  i 

4^  =Pili=k\loi)  =  ^^^  E  (18) 

where  loi  denotes  the  labels  of  the  neighbors  of  pixel  i.  Since 
is  a  conditional  probability  of  a  region,  the  localized  FGGM 
pdf  of  gray  level  Xi  at  pixel  i  is  given  by 

A' 

V{x:i\hi)  =  ^  ^k^Pk{xi)  (19) 

k=i 

where  pA:(^i)  is  given  in  (9).  Assuming  gray  values  of  the  image 
are  conditional  independent,  the  joint  pdf  of  x,  given  the  context 
labels  1,  is 


where  f  is  the  ML  estimate  of  the  model  parameter  set  r, 
and  is  the  number  of  free  adjustable  parameters  in  the  model 
[15],  [23].  AIC  criterion  will  select  the  correct  number  of  the 
image  regions  Kq  when 

Ko  =  «g{i^m»^AIC(jr)}.  (15) 


p(*ii)  =  n  E  <2») 

i=l  A:=l 

where  1  =  (Z;  :  z  =  l,  . . . ,  N1N2). 

It  is  known  that  CBRL  algorithm  can  obtain  a  consistent  la¬ 
beling  solution  based  on  the  localized  FGGM  model  (19).  Since 
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TABLE  1 

Distribution  of  the  Effective  Size  of  the  186  Masses  Used  in  This  Study.  The  Effective  Size  Is  Derned  as  the  Square  Root  of 
THE  Product  of  the  Maximum  and  Minimum  Diameters  of  the  Mass 


0  —  5mm  1  6  —  10mm 

11  “  15mm 

16  -  20mm 

21  —  25mm  |  26  —  30mm 

# 

3  1  55 

78 

29 

17  1  4 

1  represents  the  labeled  image,  it  is  consistent  if  >  Si{k), 

for  all  A:  =  1,  . . . ,  iif  and  for  z  =  1,  . . . ,  ATi  A^2  [25],  where 

Si{k)  =  T:fpk{xi).  (21) 

Now  we  can  define 

N^N2  /  \ 

^(1)-  E  E  I{lu  k)Si{k)  (22) 

i=i  \  fc  / 

as  the  average  measure  of  local  consistency,  and 


LCi  =  I{li,  k)Si{k),  *  =  1,  . . . ,  N1N2  (23) 

k 

represents  the  local  consistency  based  on  1.  The  goal  is  to  find 
a  consistent  labeling  1  which  can  maximize  (22).  In  the  real 
application,  each  local  consistency  measure  LCi  can  be  max¬ 
imized  independently.  In  [25],  it  has  been  shown  that  when 
Rijik,  h)  =  Rjiih^  if  attains  a  local  maximum  at 
1,  Aen  1  is  a  consistent  labeling. 

Based  on  the  localized  FGGM  model,  can  be  initialized 
by  ML  classifier 


=  arg 


{ 


k  =  l, 


(24) 


Then,  the  order  of  pixels  is  randomly  permutated  and  each  label 
li  is  updated  to  maximize  LCi,  i-c.,  classify  pixel  i  into  A:th 
region  if 


h  =  arg| 


,  max  TT 
k 


^^^Pk{Xi) 


A;  =  l, 


K 


(25) 


where  Pk{xi)  is  given  in  (9),  is  given  in  (18).  By  consid¬ 
ering  (24)  and  (25), we  developed  a  modified  CBRL  algorithm 
as  follows. 

CBRL  Algorithm: 


1)  Given  m  =  0 

2)  Update  pixel  labels 

•  Randomly  visit  each  pixel  for  z  =1, ,  iViiV2 

•  Update  its  label  k  according  to 


,(m) 

H 


max  TT 
k 


(t)(m) 

k 


3)  When 


N1N2 


<  1%, 


Stop;  otherwise,  m  —  m  1,  and  repeat  Step  2. 


rv.  Experimental  Results  and  Discussion 

In  this  section,  we  present  the  results  of  using  the  morpho¬ 
logical  filtering  and  model-based  segmentation  approach  we 
have  introduced  for  enhancement  and  segmentation  of  suspi¬ 


cious  masses  in  mammographic  images.  In  addition  to  the  qual¬ 
itative  assessment  by  the  radiologists,  we  introduce  several  ob¬ 
jective  measures  to  assess  the  performance  of  the  algorithms  we 
have  proposed  for  enhancement  and  segmentation. 

A  testing  data  set  of  200  mammograms  and  two  simulated 
tone  images  were  used  to  test  and  evaluate  the  performance  of 
the  algorithms  in  this  study.  The  mammograms  were  selected 
from  the  Mammographic  Image  Analysis  Society  (MIAS)  data¬ 
base  and  the  Brook  Army  Medical  Center  (BAMC)  database 
created  by  the  Department  of  Radiology  at  Georgetown  Uni¬ 
versity  Medical  Center.  Of  the  200  mammograms,  50  mammo¬ 
grams  are  normal,  and  each  of  the  150  abnormal  mammograms 
contains  at  least  one  mass  case  of  varying  size,  subtlety,  and 
location.  The  areas  of  suspicious  masses  were  identified  by  an 
expert  radiologist  based  on  visual  criteria  and  biopsy  proven 
results.  The  total  data  set  includes  113  benign  and  73  malig¬ 
nant  masses.  The  distribution  of  the  masses  in  terms  of  size 
is  shown  in  Table  I.  The  BAMC  films  were  digitized  with  a 
laser  film  digitizer  (Lumiscan  150)  at  a  pixel  size  of  100  pmx 
100  pm  and  4096  gray  levels  (12  bits).  Before  the  method  was 
applied  the  digital  mammograms  were  smoothed  by  averaging 
4x4  pixels  into  one  pixel.  According  to  radiologists,  the  size 
of  small  masses  is  3-15  mm  in  effective  diameter.  A  3-mm 
object  in  an  original  mammogram  occupies  30  pixels  in  a 
digitized  image  with  a  lOO-^um  resolution.  After  reducing  the 
image  size  by  four  times,  the  object  will  occupy  the  range  of 
about  seven  to  eight  pixels.  The  object  with  the  size  of  seven 
pixels  is  expected  to  be  detectable  by  any  computer  algorithm. 
Therefore,  the  shrinking  step  is  applicable  for  mass  cases  and 
can  save  computation  time. 

Experimental  Evaluation  of  Morphological  Enhance¬ 
ment:  In  order  to  justify  the  suitability  of  morphological 
structural  elements,  the  geometric  properties  of  the  contexts 
and  textures  in  mammograms  were  studied.  The  basic  idea 
is  to  keep  all  mass-like  objects  within  certain  size  range  and 
remove  all  others  by  using  the  proposed  morphological  filters 
with  specific  structural  elements.  At  the  resolution  of  400 
^m,  a  disk  with  a  diameter  of  seven  pixels  was  chosen  as  the 
morphological  structuring  elements  Bi  to  extract  textures  in 
mammograms.  Since  the  smallest  masses  have  seven  pixels  in 
diameter  with  the  resolution  of  400  pm,  this  procedure  would 
not  destroy  mass  information.  For  the  purpose  of  background 
correction,  a  disk  with  a  diameter  of  75  pixels  was  used  as 
the  morphological  structuring  element  B2^  An  object  with  a 
diameter  of  75  pixels  corresponds  to  30  mm  in  the  original 
mammogram.  This  indicates  that  all  masses  with  sizes  up  to 
30  mm  can  be  enhanced  by  background  correction.  Masses 
larger  than  30  mm  are  rare  cases  in  the  clinical  setting.  In  the 
last  stage  of  our  approach,  we  applied  morphological  opening 
and  closing  filtering  using  a  disk  with  a  diameter  of  five  to 
eliminate  small  objects  which  also  contribute  to  texture  noise. 
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(a)  (b) 

Fig.  3.  (a)  Original  simulated  test  image  for  model  selection  (A'o  =  4,  SNR  =  10  dB)  and  (b)  the  AIQMDL  curves  in  model  selection  {a  =  30). 


All  testing  mammograms  were  processed  using  the  proposed 
enhancement  approach  with  the  suggested  structuring  element 
Bi  and  B2>  Fig.  5  shows  processed  mammogram  examples 
using  the  morphological  enhancement.  Compared  the  enhanced 
results  [Fig.  5(b)  and  (d)]  with  the  original  mammograms 
[Fig.  5(a)  and  (c)],  the  proposed  method  not  only  enhanced  all 
suspected  mass  patterns  and  reduced  the  texture  noise,  but  also 
removed  the  background  noise.  In  summary,  the  proposed  mor¬ 
phological  enhancement  approach  can  enhance  mass  patterns 
and  remove  texture  structure  noises.  For  dense  mammograms, 
such  as  the  second  example  in  Fig.  5(c)  and  (d),  the  mass 
is  obscured  by  dense  fibroglandular  tissues,  our  experience 
shows  applying  the  dual  morphological  operation  to  remove 
the  fibroglandular  tissue  background  is  useful.  In  addition 
to  the  visual  evaluation  by  the  radiologist,  we  performed  the 
segmentation  to  assess  the  effectiveness  of  the  morphological 
filtering,  based  on  the  enhanced  mammograms  and  the  original 
mammograms. 

Simulated  Evaluation  of  Segmentation  Algorithms:  The 
performance  of  model  selection  using  two  frequently  used 
methods,  i.e.,  the  AIC  and  MDL  [22],  were  first  tested  and 
compared  in  the  simulation  study.  The  computer-generated  data 
was  made  up  of  four  overlapping  normal  components.  Each 
component  represents  one  local  region.  The  value  for  each 
component  were  set  to  a  constant  value,  the  noise  of  normal 
distribution  was  then  added  to  this  simulation  digital  phantom. 
Three  noise  levels  with  different  variance  were  set  to  keep  the 
same  signal-to-noise  ratio  (SNR),  where  SNR  is  defined  by 

SNR  =  101ogio^^  (26) 

where  Afx  is  the  mean  difference  between  regions,  and  <7^  is 
the  noise  power.  The  original  data  for  the  simulation  study  are 


(c)  (d) 


Fig.  4.  Image  segmentation  by  CBRL  on  simulated  image  (with  initialization 
by  ML  classification),  (a)  ML  initialization,  (b)  First  iteration  in  CBRL.  (c) 
Second  iteration  in  CBRL.  (d)  Third  iteration  in  CBRL. 


TABLE  11 

Comparison  of  CBRL,  ICM,  and  MICM  Algorithm:  Simulated  Data 


Item  1 

CBRL  Result 

[ICM  Result 

1  MICM  Result 

Clauisification  Error  | 

0.7935% 

1  0.7508% 

1  0.3113% 

given  in  Fig.  3(a).  The  AIC  and  MDL  curves,  as  functions  of  the 
number  of  local  clusters  if,  are  plotted  in  Fig.  3(b).  According 
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(a)  (b)  (c)  (d) 

Fig.  5.  Examples  of  mass  enhancement,  (a)  Original  mammogram,  (b)  Enhanced  mammogram,  (c)  and  (d)  Another  original  mammogram  and  its  enhanced  result. 


to  the  information  theoretic  criteria,  the  minima  of  these  curves 
indicate  the  correct  number  of  the  local  regions.  From  this  ex¬ 
perimental  figure,  it  is  clear  that  the  number  of  local  regions 
suggested  by  these  criteria  are  all  correct. 

For  the  validation  of  image  segmentation  using  CBRL,  we 
apply  the  algorithm  first  to  a  simulated  image.  We  use  ML  clas¬ 
sifier  to  initialize  image  segmentation,  i.e.,  to  initialize  the  quan¬ 
tified  image  by  selecting  the  pixel  label  with  largest  likelihood 
at  each  node.  The  classification  error  after  initialization  is  uni¬ 
formly  distributed  over  the  spatial  domain  as  shown  in  Fig.  4(a), 
Our  experience  suggested  this  to  be  a  very  suitable  starting  point 
for  contextual  relaxation  labeling  [21].  The  CBRL  is  then  per¬ 
formed  to  fine  tune  the  image  segmentation.  It  should  be  em¬ 
phasized  that  the  ground  truth  is  known  in  this  simulated  ex¬ 
periment,  the  percentage  of  total  classification  error  is  used  as 
the  criterion  for  evaluating  the  performance  of  segmentation 
technique.  In  Fig.  4(a)-(d),  the  initial  segmentation  by  the  ML 
classification  and  the  stepwise  results  of  three  iterations  in  the 
CBRL  are  presented.  In  this  experiment,  algorithm  initializa¬ 
tion  results  in  an  average  classification  error  of  30%.  It  can 
be  clearly  seen  that  a  dramatic  improvement  is  obtained  after 
several  iterations  of  the  CBRL  by  using  local  constraints  de¬ 
termined  by  the  context  information.  In  addition,  the  conver¬ 
gence  is  fast  as  one  can  see,  after  the  first  iteration  most  of 
the  misclassification  are  removed.  We  have  also  implemented 
two  other  independent  and  popular  algorithms,  namely,  the  it¬ 
erated  conditional  mode  (ICM)  and  the  modified  iterated  con¬ 
ditional  mode  (MICM)  algorithms,  so  as  to  assess  the  compar¬ 
ative  performance  of  the  segmentation  results  among  different 
approaches  [21],  [22] .  The  only  assumption  being  made  by  these 
three  methods  is  the  Markovian  property  of  the  context  images 
which  can  be  well  justified  by  the  underlying  ceil  oncology 
and  pathology.  We  have  applied  these  three  algorithms  to  the 
same  testing  image  and  the  corresponding  classification  errors 
are  presented  in  Table  n.  The  final  percentage  of  classification 
errors  for  Fig.  4(d)  is  0.7935%.  From  this  experimental  compar¬ 
ison,  it  can  be  concluded  that  three  algorithms  achieved  com- 


TABLE  III 

Computed  AICs  for  the  FGGM  Model  with  Different  a 


~ir 

CK  =  1.0 

a  =  2.0 

a  =  3.0  i 

a  =  4.0 

~ 

651250 

650570 

650600 

650630 

3 

646220 

644770 

645280 

646200 

4 

645760 

644720 

645260 

646060 

5 

645760 

644700 

645120 

646040 

6 

645740 

644670 

645110 

645990 

7 

645640 

644600 

645090 

645900 

8 

645550  (min) 

644570(iiun) 

645030(min) 

645850{mm) 

9 

645580 

644590 

645080 

645880 

10 

645620 

644600 

645100 

645910 

TABLE  IV 

Computed  MDLs  for  the  FGGM  Model  with  Different  a 


K 

a  =  1.0 

a  =  2.0 

a  =  3.0 

a  =  4.0 

'IT 

651270 

650590 

650630 

650660 

3 

646260 

644810 

645360 

646350 

4 

645860 

644770 

645280 

646150 

5 

645850 

644770 

645280 

646100 

6 

645790 

644750 

645150 

646090 

7 

645720 

644700 

645120 

645930 

8 

645680  (min) 

644690(min) 

645100(mm) 

645900(min) 

9 

645710 

644710 

645140 

645930 

10 

645790 

644750 

645180 

645960 

parable  segmentation  accuracy  and  the  result  produced  by  the 
MICM  algorithm  is  most  superior,  though  in  terms  of  computa¬ 
tional  complexity  the  CBRL  algorithm  is  the  least.  It  should  be 
noticed  that  since  in  MICM  algorithm  an  inhomogeneous  con¬ 
figuration  of  the  Markov  random  field  is  used,  its  superior  per¬ 
formance  is  reasonable. 

On  Model-Based  Segmentation — Real  Case  Study:  In  the 
real  case  study,  we  used  two  information  criteria  (AIC  and 
MDL)  to  determine  K.  Tables  III  and  IV  shows  the  AIC  and 
MDL  values  with  different  K  and  a  of  the  FGGM  model  based 
on  one  original  mammogram.  As  it  can  be  seen  from  Tables  in 
and  IV,  although  with  different  a,  all  AIC  and  MDL  values 
achieve  the  minimum  when  K  =  8.  It  indicates  that  AIC  and 
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Fig,  6.  AIC  and  MDL  curves  with  different  number  of  region  K .  (a)  Result  based  on  the  original  mammogram,  the  optimal  A"  =  8.  (b)  Result  based  on  the 
enhanced  mammogram,  the  optimal  K  =  4. 


MDL  are  relatively  insensitive  to  the  change  of  a.  With  this 
observation,  we  can  decouple  the  relation  between  K  and  a 
and  choose  the  appropriate  value  of  one  while  fixing  the  value 
of  another.  Fig.  6(a)  and  (b)  are  two  examples  of  AIC  and  MDL 
curves  with  different  K  and  fixed  a  =  3.0.  Fig.  6(a)  is  based  on 
the  original  mammogram  and  Fig.  6(b)  is  based  on  the  enhanced 
mammogram.  As  we  can  see  in  Fig.  6(a),  both  criteria  achieved 
the  minimum  when  if  =  8.  It  should  be  noticed  that  though  no 
ground  truth  is  available  in  this  case,  our  extensive  numerical 
experiments  have  shown  a  very  consistent  performance  of 
the  model  selection  procedure  and  all  the  conclusions  were 
strongly  supported  by  the  previous  independent  work  reported 
by  [14].  Fig.  6(b)  indicates  that  K  =  4is  the  appropriate 
choice  for  the  mammogram  enhanced  by  dual  morphological 
operation.  This  is  believed  to  be  reasonable  since  the  number 
of  regions  decrease  after  background  correction. 

We  fixed  if  =  8,  and  changed  the  value  of  cv  for  estimating 
the  FGGM  model  parameters  using  the  proposed  EM  algorithm 
with  the  original  mammogram  The  GRE  value  between  the  his¬ 
togram  and  the  estimated  FGGM  distribution  was  used  as  a  mea¬ 
sure  of  the  estimation  bias.  We  found  that  GRE  achieved  a  min¬ 
imum  distance  when  the  FGGM  parameter  a  =  3.0  as  shown  in 
Fig.  7.  The  similar  result  was  shown  when  we  applied  the  EM 
algorithm  to  the  enhanced  mammogram  with  if  =  4.  This  in¬ 
dicated  that  the  FGGM  model  might  be  better  than  the  FNM 
model  (a  =  2.0)  in  modeling  mammographic  images  when 
the  true  statistical  properties  of  mammograms  are  generally  un¬ 
known,  though  the  FNM  has  been  most  often  chosen  in  many 
previous  work  [15]. 

After  the  determination  of  all  model  parameters,  every  pixel 
of  the  image  was  labeled  to  a  different  region  (from  1  to  if) 
based  on  the  CBRL  algorithm.  We  then  selected  the  brightest  re- 


TABLE  V 

Comparison  of  Segmantation  Error  Resulting  from  Noncontextual 
AND  Contextual  Methods 

Method  I  Soft  Classification  |  Bayesian  Classification  |  CBRL 
GRE  Value  |  0.0067  |  0.4406  |  0.1^ 

gion,  which  corresponding  to  label  if,  plus  a  criterion  of  closed 
isolated  area,  as  the  candidate  region  of  suspicious  masses.  Ac¬ 
cording  to  the  visual  inspections  by  the  radiologists,  when  we 
use  if  —  1  instead  of  if,  the  results  are  over-segmented.  For  the 
case  of  using  if  -f- 1,  the  results  are  under-segmented.  In  order 
to  quantify  the  performance  differences  between  the  different 
segmentation  methods,  several  groups  have  suggested  that  the 
segmentation  results  may  be  compared  against  radiologists’  out¬ 
lines  of  the  lesions  [3].  Though  the  proposed  comparison  mea¬ 
sures  are  quantitative,  the  performance  measures  are  still  quali¬ 
tative,  since  the  reference  base  (e.g.,  gold  standard  by  the  radi¬ 
ologists)  is  qualitative,  subjective,  and  imperfect.  Therefore,  in 
this  model-supported  approach,  in  addition  to  the  visual  inspec¬ 
tions  by  the  radiologists,  we  have  also  introduced  an  objective 
measure,  the  GRE  between  the  histogram  of  the  pixel  images 
Px(w)  and  the  FGGM  of  the  segmented  image  to  assess 

the  performance  of  the  segmentation,  defined  by 

GRE(p,.(ii)||p,, ,(«))  =  P-M  -^4^  (27) 

where  1  is  the  context  image  estimated  by  the  segmentation  al¬ 
gorithm.  Considering  that  the  ergodic  theorem  is  the  most  fun¬ 
damental  principle  in  the  detection  and  estimation  theory,  it  is 
believed  that  when  a  good  segmentation  is  achieved,  the  dis¬ 
tance  between  the  Px(u)  and  px,  i(w)  should  be  minimized  and 
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(a)  (b)  (c)  (d) 

Fig.  8.  Suspected  mass  segmentation  results  based  on  the  original  mammogram,  (b)  Result  based  on  the  enhanced  mammogram,  K  =  4,  q  =  3.0.  (c)  and  (d) 
Results  based  on  another  original  mammogram  and  its  enhanced  image. 
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Fig.  10.  Examples  of  normal  dense  mammogram,  (a)  Original  mammogram,  (b)  Segmentation  result  based  on  the  original  mammogram,  (c)  Enhanced 
mammogram,  (d)  Result  based  on  the  enhanced  mammogram,  k  —  4,  n  =  3,0, 


this  measure  links  the  image  text  and  its  sample  averages.  Our 
experience  has  suggested  that  this  posLsegmentation  measure 
may  be  a  suitable  objective  criterion  for  evaluating  the  quality 
of  image  segmentation  in  a  fully  unsupervised  situation  [22], 
[26]-[28].  Table  V  shows  our  evaluation  data  from  three  dif¬ 
ferent  segmentation  methods  when  applied  to  the  real  images. 

Performance  of  Combined  Morphological  Filtering  and 
Model-Based  Segmentation  using  a  Larger  Database:  The 
proposed  segmentation  method  was  used  to  extract  suspicious 
mass  regions  from  the  200  testing  mammograms.  Without  en¬ 
hancement,  a  total  of  1 142  potential  mass  regions  were  isolated 
including  1 14  of  the  1 86  true  masses.  With  enhancement,  a  total 
of  3143  potential  mass  regions  were  extracted  including  181  of 
the  186  true  masses.  The  results  demonstrated  that  more  true 
masses  were  picked  up  after  enhancement  although  more  false 
cases  were  also  included.  The  undetected  areas  mainly  occurred 
at  the  lower  intensity  side  of  the  shaded  objects  or  obscured  by 
fibroglandular  tissues  that,  however,  were  extracted  on  morpho¬ 


logical  enhanced  mammograms.  In  addition,  when  the  margins 
of  masses  are  ill  defined,  only  parts  of  suspicious  masses  were 
extracted  from  the  original  mammograms.  For  the  purpose  of 
“lesion  site  selection,”  we  believe  that  the  sensitivity  should  be 
the  sole  criterion  for  the  performance  evaluation  of  the  method. 
We  have  181/186  versus  114/186.  Our  method  is  unsupervised 
and  automatic  and  does  not  involve  any  detection  effort  at  this 
moment.  To  our  best  knowledge,  there  is  no  objective  criterion 
available  for  the  evaluation  of  image  enhancement  performance 
before  a  detection  effort  is  involved.  We  only  claimed  that  the 
enhancement  step  is  important  and  effective  with  respect  to  the 
purpose  of  “lesion  site  selection.” 

Fig.  8  demonstrates  some  segmentation  results  based  on  the 
original  and  enhanced  mammograms.  We  compared  the  seg¬ 
mentation  results  based  on  the  enhanced  mammogram  (K  = 
4,  and  a  =  3.0)  with  those  based  on  the  original  mammogram 
{K  =  8,  and  a  =  3.0)  as  shown  in  Fig.  8.  Comparing  the  re¬ 
sults  in  Fig.  8(b)  with  those  in  Fig.  8(a),  we  can  see  that  after 
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Fig.  11.  Comparison  results  of  segmantation  based  on  the  enhanced  mammograms.  Black  outlines  denote  the  computer-segmented  result.  White  outlines  denote 
ther  radiologist-segmanted  results. 


enhancement,  a  more  accurate  region  was  detected  for  the  sus¬ 
pected  mass  which  has  ill-defined  margin.  Getting  an  accurate 
suspected  region  is  a  crucial  issue  since  geometric  features  are 
extracted  based  on  suspected  regions  and  these  features  are  very 
important  for  further  true  mass  detection.  In  addition,  we  ob¬ 
served  that  one  suspected  mass  was  missed  in  Fig.  8(a)  but  was 
detected  in  Fig.  8(b).  As  we  have  mentioned  in  Section  I,  none  of 
the  suspected  masses  should  be  missed  in  the  segmentation  step. 
Fig.  8(c)  and  (d)  demonstrate  the  segmentation  of  a  suspected 


mass  that  lies  in  dense  breast  tissue.  As  shown  in  Fig.  8(c),  the 
whole  fibroglandular  tissue  area  was  segmented  when  based  on 
the  original  mammogram.  After  enhancement,  the  suspected  re¬ 
gion  was  segmented  exactly  as  shown  in  Fig.  8(d). 

We  have  also  included  the  segmentation  results  on  the  normal 
mammograms.  Fig.  9  demonstrate  the  segmentation  results 
based  on  the  original  and  enhanced  mixed  fatty  and  glandular 
mammograms.  Fig.  10  demonstrate  the  segmentation  results 
based  on  the  original  and  enhanced  dense  mammograms.  We 
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would  like  to  emphasize  that  the  objective  of  this  paper  is 
to  provide  a  segmentation  technique  which  can  enhance  and 
extract  potential  mass  site  from  the  background  so  that  the 
characterization  of  the  related  mass  pattern  can  be  accurately 
extracted  in  terms  of  focused  feature  selection  and  analysis. 
The  method  of  course  will  produce  many  mass-like  areas,  but 
it  will  be  a  plausible  outcome  since  the  accurate  description  of 
nonmass  cases  characterized  by  mass-like  sites  will  benefit  the 
follow-on  detection  step  where  the  performance  of  the  classifier 
depends  on  an  accurate  separation  of  mass  and  nonmass  in  the 
featured  spaces.  The  details  will  be  described  in  [29]. 

For  the  purpose  of  evaluating  the  performance  of  the  segmen¬ 
tation  method,  we  used  both  simulated  studies  and  expert  visual 
inspection  to  validate  the  methods  and  results.  The  radiologist 
has  concluded  that  the  lesion  characteristics  after  the  proposed 
enhancement  have  been  better  displayed  and  all  possible  lesion 
areas  have  been  successfully  identified.  In  addition  to  the  vi¬ 
sual  inspection,  we  have  measured  the  overlap  between  the  com¬ 
puter-segmented  and  the  radiologist  segmented  mass  regions  to 
evaluate  our  method.  Fig.  1 1  shows  the  comparison  results  of 
segmentation  based  on  the  enhanced  mammograms.  Fig.  1 1  in¬ 
cludes  60  benign  and  malignant  mass  patches  which  were  cut 
from  the  whole  mammograms  after  the  segmentation.  The  white 
outline  was  drawn  by  the  radiologist  while  the  black  outline  was 
produced  by  the  computer  and  was  superimposed  upon  the  orig¬ 
inal  image.  As  we  can  see  from  Fig.  1 1,  for  most  of  cases,  the 
ratio  of  mutual  overlap  area  of  the  radiologist  segmented  mass 
region  and  the  computer-segmented  mass  region  to  the  radiol¬ 
ogist  segmented  mass  area  is  large  than  50%.  In  addition,  even 
the  poorest  result  picked  the  true  lesion  in  the  correct  location 
and  depicted  the  characteristics  of  the  mass  reasonably.  It  is  im¬ 
portant  to  understand  that  “lesion  area  segmentation”  is  not  our 
objective,  so  there  is  no  “best”  or  “worst”  segmentation  results. 
Our  objective  is  “lesion  site  selection”  with  a  possible  highest 
sensitivity  through  a  global  unsupervised  enhancement  and  seg¬ 
mentation  scheme. 

V.  Conclusion 

In  this  paper,  we  propose  a  combined  method  of  using  mor¬ 
phological  operations,  a  FGGM  modeling,  and  a  CBRL  to  en¬ 
hance  and  segment  various  breast  tissue  textures  and  suspicious 
mass  lesions  from  mammographic  images.  This  phase  is  a  cru¬ 
cial  step  in  mass  detection  for  an  improved  CAD.  We  empha¬ 
sized  the  importance  of  model  selection  which  includes  the  se¬ 
lection  of  the  number  of  image  regions  K  and  the  selection  of 
FGGM  kernel  shape  controlled  by  a.  The  experimental  results 
indicate  that  the  suspected  masse  sites  selection  can  be  affected 
by  different  K  and  a.  We  proposed  the  EM  algorithm  together 
with  the  information  theoretic  criteria  to  determine  the  optimal 
K  and  a.  With  optimal  K  and  a,  the  segmentation  results  can  be 
significantly  improved.  We  also  showed  that  with  the  proposed 
pattern-dependent  enhancement  algorithm  using  morphological 
operations,  the  subtle  masses  can  be  segmented  more  accurately 
than  those  when  the  original  image  is  used  for  extraction  without 
enhancement.  To  summarize,  the  morphological  filtering  en¬ 
hancement  combined  with  the  stochastic  model-based  segmen¬ 
tation  is  an  effective  way  to  extract  mammographic  suspicious 


patterns  of  interest,  and  thereby  may  facilitate  the  overall  per¬ 
formance  of  mammographic  CAD  of  breast  cancer. 


Acknowledgment 

The  authors  would  like  to  thank  Z.  Gu  of  the  Lombardi 
Cancer  Center  and  I.  Sesterhenn  of  the  Armed  Forces  Institute 
of  Pathology  for  their  scientific  input  on  the  knowledge  of  cell 
oncology  and  pathology,  and  R.  Shah  MD,  Director  of  Breast 
Imaging,  BAMC  for  his  evaluation  of  cases  to  our  database. 


References 

[1]  H.  Kobaiake,  M.  Murakami,  H.  Takeo,  and  S.  Nawano,  “Computerized 
detection  of  malignant  tumors  on  digital  mammograms  ”  IEEE  Trans. 
Med.  Imag.,  vol.  18,  pp.  369-378,  May  1999. 

[2]  R.  Zwiggelaar,  T.  C.  Parr,  J.  E.  Schumm,  I.  W.  Hutt,  C.  J.  Taylor,  S. 
M.  Astley,  and  C.  R.  M.  Boggis,  “Model-based  detection  of  spiculated 
lesions  in  mammograms,”  Med.  Image  Anal.,  voL  3,  no.  1,  pp.  39-^2, 
1999. 

[3]  M.  A.  Kupinski  and  M.  L.  Giger,  “Automated  seeded  lesion  segmen¬ 
tation  on  digital  mammograms,”  IEEE  Trans.  Med.  Ima^.,  vol.  17,  pp. 
510-517,  Aug.  1998. 

[4]  N.  Karssemeijer  and  G.  M.  te  Brake,  “Detection  of  stellate  distortions 
in  mammogram,”  IEEE  Trans.  Med.  Ima^.,  vol.  15,  pp.  611-619,  Oct 
1996. 

[5]  W.  K.  Zouras,  M.  L.  Giger,  P.  Lu,  D.  E.  Wolverton,  C.  J.  Vybomy,  and 
K.  Doi,  “Investigation  of  a  temporal  subtraction  scheme  for  computer¬ 
ized  detection  of  breast  masses  in  mammograms,”  Excerpta  Medica  vol 
1119,  pp.  411^15,  1996. 

[6]  N.  Petrick,  H.  P.  Chan,  B.  Sahiner,  and  D.  Wei,  “An  adaptive  density- 
weighted  contrast  enhancement  filter  for  mammographic  breast  mass 
detection,”  IEEE  Trans.  Med.  Imag.,  vol.  15,  no.  1,  pp.  59-67,  1996. 

[7]  M.  Sameti  and  R.  K.  Ward,  “A  fussy  segmentation  algorithm  for  mam¬ 
mogram  patttion,”  in  Digital  Mammgraphy.  ser.  International  Congress 
Series,  K.  Doi,  Ed.  Amsterdam,  The  Netherlands:  Elsevier.  1996,  pp. 
471^74. 

[8]  W.  P.  Kegelmeyer  Jr.,  J.  M.  Pruneda,  P.  D.  Bourland.  A.  Hillis,  M.  W. 
Riggs,  and  M.  L.  Nipper,  “Computer-aided  mammographic  screening 
for  spiculated  lesions,”  Radiology,  vol.  191,  pp.  331-337, 1994. 

[9]  F.  F.  Yin,  M.  L.  Giger,  C.  J.  Vybomy,  K.  Doi,  and  R.  A.  Schmidt,  “Com¬ 
parison  of  bilateral-subtraction  and  single-image  processing  techniques 
in  the  computerized  detection  of  mammographic  masses,”  Investigat. 
Radiol.,  vol.  28,  no.  6,  pp.  473-481,  1993. 

[10]  B.  Zheng,  Y.  H.  Chang,  and  D.  Gur,  “Computerized  detection  of  masses 
in  digitized  mammograms  using  single-image  segmentation  and  a  mul¬ 
tilayer  topographic  feature  analysis,”  Ara^.  Radiol.,  vol.  2,pp.  959-966 
1995. 

[11]  H.  D.  Li,  M.  Kallergi,  L.  P.  Clarke,  V.  K.  Jain,  and  R.  A.  Clark,  “Maricov 
random  field  for  tumor  detection  in  digital  mammography,”  IEEE  Trans. 
Med.  Imag.,  vol.  14,  pp.  565-576,  Sept.  1995. 

[12]  M.  L.  Giger,  C.  J.  Vybomy,  and  R.  A.  Schmidt,  “Computerized  charac¬ 
terization  of  mammographic  masses:  Analysis  of  spiculation,”  Cancer 
Lett.,  vol.  77,  pp.  201-211,  1994. 

[13]  T.  K.  Lau  and  W.  F.  Bischof,  “Automated  detection  of  breast  tumors 
using  the  asymmetry  approach.”  Comput.  Biomed.  Res.,  vol.  24,  no.  9, 
pp.  1501-1513,  1995. 

[14]  M.  J.  Bianchi,  A.  Rios,  and  M.  Kabuka,  “An  algorithm  for  detection 
of  masses,  skin  contours,  and  enhancement  of  microcalcificaiions  in 
mammograms,”  in  Proc.  ,  Symp.  Computer  Assisted  Radiology,  Win¬ 
ston-Salem,  NC,  June  1994,  pp.  57-64. 

[15]  T.  Lei  and  W.  Sewchand,  “Statistical  approach  to  x-ray  CT  imaging  and 
its  application  in  image  analysis-Part  I!;  A  new  stochastic  model-based 
image  segmentation  technique  for  x-ray  CT  image,”  IEEE  Trans.  Med. 
Imag.,  vol.  1 1,  pp.  62-69,  Feb.  1992. 

[16]  Y.  \Vang,  T.  Adali,  and  S.-C.  B.  Lo,  “Automatic  threshold  selection  using 
histogram  quantization,”  SPIE  J.  Biomedical  Optics,  vol.  2,  no.  2,  pp 
211-217,  April  1997. 

[17]  J.  Zhang  and  J.  W.  Modestino,  “A  model-fitting  approach  to  cluster  vali¬ 
dation  with  application  to  stochastic  model-based  image  segmentation,” 
IEEE  Trans.  Pattern  Anal.  Machine  Intel!.,  \o\.  12,  pp.  1009-1017,  Oct 
1990. 


f 


LI  et  air.  COMPUTERIZED  RADIOGRAPHIC  MASS  DETECTION— PART  I 


[18]  H,  Li,  K.  J.  R.  Liu,  Y.  Wang,  and  S.  C.  Lo,  “Morphological  filtering  and 
stochastic  modeling-based  segmentation  of  masses  on  mammographic 
images,”  in  Proc.  IEEE  Nuclear  Science  Symp.  Medical  Imaging  Conf., 
1996,  pp.  1792-1796. 

[19]  J.  Senra,  Image  Analysis  and  Mathematical  Morphology.  London,  U. 
K.:  Academic,  1982. 

[20]  A.  P.  Dempster,  N.  M.  Laird,  and  D.  B.  Rubin,  “Maximum  likelihood 
from  incomplete  data  via  the  EM  algorithm,”  J.  Roy.  Statist.  Soc.  Ser.  B, 
vol.  39,  pp.  1-38,  1977. 

[21]  Y.  Wang,  T.  Adali,  C.  M.  Lau,  and  S.  Y  Kung,  “Quantitative  analysis  of 
MR  brain  image  sequences  by  adaptive  self-organizing  finite  mixtures,” 
J.  VLSI  Signal  Processing,  vol.  18,  no.  3,  pp.  219-240, 1998. 

[22]  Y.  Wang,  T.  Adali,  S.  Y.  Kung,  and  Z.  Szabo,  “Quantification  and  seg¬ 
mentation  of  brain  tissues  from  MR  images:  A  probabilistic  neural  net- 
woric  approach,”  IEEE  Trans.  Image  Processing,  vol.  7,  pp.  1165-1181, 
Aug.  1998. 

[23]  H.  Akaike,  “A  new  look  at  the  statistical  model  identification,”  IEEE 
Trans.  Automat.  Contr,  vol.  19,  no.  6,  pp.  716-723, 1974. 

[24]  J.  Rissanen,  “Modeling  by  shortest  data  description,”  Automat.,  vol.  14, 
pp.  465^71,  1978. 


301 


[25]  R.  A.  Hummel  and  S.  W.  Zucker,  “On  the  foundations  of  relaxation  la¬ 
beling  processes,”  IEEE  Trans.  Pattern  Anal.  Machine  IntelL,  vol.  5,  pp. 
267-286,  Mar.  1983. 

[26]  A.  Hoover,  G.  J.  Baptoste,  X.  Jiang,  P.  J.  Flynn,  H.  Bunke,  D.  B.  Goldgof, 
K.  Bowyer,  D.  W.  Eggert,  A.  Fitzgibbon,  and  R.  B.  Fisher,  “An  ex¬ 
perimental  comparison  of  range  image  segmentation  algorithms,”  IEEE 
Trans.  Pattern  Anal.  Machine  Intell,  vol.  18,  pp.  673-688,  July  1996. 

[27]  Y.  J.  Zhang,  “A  survey  on  evaluation  methods  for  image  segmentation,” 
Pattern  Recogn.,  vol.  29,  no.  8,  pp.  1335-1346, 1996. 

[28]  A.  M.  Bensaid,  L.  O.  Hall,  J.  C.  Bezdek,  L.  P.  Clarke,  M.  L.  Silbiger,  J. 
A.  Arrington,  and  R.  F.  Murtagh,  “Validity-guided  clustering  with  ap¬ 
plications  to  image  segmentation,”  IEEE  Trans.  Fuzzy  Syst.,  vol.  4,  pp. 
112-123,  May  1996. 

[29]  H.  Li,  Y.  Wang,  K.  J.  R.  Liu,  S.-C.  B.  Lo,  and  M.  T.  Freedman,  “Com¬ 
puterized  Radiographic  Mass  Detection — ^Part  II:  Decision  Support  by 
Featured  Database  Visualization  and  Modular  Neural  Networks,”  IEEE 
Trans.  Med.  Imag.,  vol.  20,  no.  4,  pp.  302-313,  Apr.  2001. 


302 


IEEE  TRANSACTIONS  ON  MEDICAL  IMAGING.  VOL.  20,  NO.  4,  APRIL  2001 


Computerized  Radiographic  Mass  Detection — Part  II: 
Decision  Support  by  Featured  Database  Visualization 
and  Modular  Neural  Networks 

Huai  Li,  Yue  Wang,  K.  J.  Ray  Liu*,  Shih-Chung  B.  Lo,  and  Matthew  T.  Freedman 


Abstract — Based  on  the  enhanced  segmentation  of  suspicious 
mass  areas,  further  development  of  computer-assisted  mass  detec¬ 
tion  may  be  decomposed  into  three  distinctive  machine  learning 
tasks:  1)  construction  of  the  featured  knowledge  database;  2)  map¬ 
ping  of  the  classified  and/or  unclassified  data  points  in  the  data¬ 
base;  and  3)  development  of  an  intelligent  user  interface.  A  decision 
support  system  may  then  be  constructed  as  a  complementary  ma¬ 
chine  observer  that  should  enhance  the  radiologists  performance  in 
mass  detection.  We  adopt  a  mathematical  feature  extraction  pro¬ 
cedure  to  construct  the  featured  knowledge  database  from  all  the 
suspicious  mass  sites  localized  by  the  enhanced  segmentation.  The 
optimal  mapping  of  the  data  points  is  then  obtained  by  learning  the 
generalized  normal  mixtures  and  decision  boundaries,  where  a  is 
developed  to  carry  out  both  soft  and  hard  clustering.  A  visual  ex¬ 
planation  of  the  decision  making  is  further  invented  as  a  decision 
support,  based  on  an  interactive  visualization  hierarchy  through 
the  probabilistic  principal  component  projections  of  the  knowledge 
database  and  the  localized  optimal  displays  of  the  retrieved  raw 
data.  A  prototype  system  is  developed  and  pilot  tested  to  demon¬ 
strate  the  applicability  of  this  framework  to  mammographic  mass 
detection. 

Index  Terms — Feature  extraction,  knowledge  database,  mass  de¬ 
tection,  neural  network,  visual  explanation. 


1.  Introduction 

IN  ORDER  to  improve  mass  lesion  detection  and  classifi¬ 
cation  in  clinical  screening  and/or  diagnosis  of  breast  can¬ 
cers,  many  sophisticated  computer-assisted  diagnosis  (CAD) 
systems  have  been  recently  developed  [1]“[10].  Although  the 
clinical  roles  of  the  CAD  systems  may  still  be  debatable,  the 
fundamental  role  should  be  complementary  to  the  radiologists* 
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Fig.  1.  Major  components  in  CAD. 

clinical  duties,  where  the  pathways  of  achieving  ultimate  perfor¬ 
mance  enhancement  taken  by  the  machine  observer  and  human 
observer  may  not  necessarily  be  close.  For  example,  CAD  sys¬ 
tems  may  attack  the  tasks  that  the  radiologists  cannot  perform 
well  or  find  difficult  to  perform.  Because  of  generally  larger  size 
and  complex  appearance  of  masses,  especially  the  existence  of 
spicules  in  malignant  lesions,  as  compared  with  microcalcifi¬ 
cations,  feature-based  approaches  are  largely  adopted  in  many 
CAD  systems  [l]-[4],  [6],  [7].  Kegelmeyer  has  first  reported 
promising  results  for  detecting  spiculated  tumors  based  on  local 
edge  characteristics  and  Laws  texture  features  [7].  Zwiggelaar 
et  a/.developed  a  statistical  model  to  describe  and  detect  the 
abnormal  pattern  of  linear  structures  of  spiculated  lesions  [1]. 
Karssemeijer  et  ai  [2]  proposed  to  identify  stellate  distortions 
by  using  the  orientation  map  of  line-like  structures.  Petrick  et 
al  presented  to  reduce  the  false  positive  detection  by  combining 
the  breast  tissue  composition  information  [4].  Zhang  et  al.  used 
the  Hough  spectrum  to  detect  spiculated  lesions  [6]. 

Although  many  previously  proposed  approaches  have  led 
to  impressive  results  [1H5],  [7],  several  fundamental  issues 
remain  unresolved  in  the  application  of  CAD  systems.  Fig.  1 
shows  a  general  block  diagram  of  CAD  systems.  Previous 
research  has  demonstrated  that:  1)  breast  cancer  is  missed  on 
mammograms  in  part  because  the  optical  density  and  contrast 
of  the  cancer  is  not  optimal  for  human  observer;  2)  com¬ 
puter-based  detection  appears  to  be  more  affected  by  different 
criteria  than  human  perception;  3)  the  challenges  and  pathways 
to  the  human  or  machine  observers  may  be  quite  different,  and 
4)  decision  making  by  the  CAD  systems  are  largely  not  trans¬ 
parent  to  the  user.  For  example,  the  training  cases  contributing 
to  the  database  are  often  selected  by  the  human  observer 
while  the  featured  knowledge  database  is  constructed  through 
mathematical  pathways  of  feature  extraction.  The  mismatch 
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between  the  human  supervised  case  selection  in  training  and 
the  machine  dominant  mass  candidates  selection  in  testing 
may  exist.  Second,  the  featured  knowledge  database  is  often 
high-dimensional  with  complex  internal  structures.  Imposing 
a  heuristically  designed  neural  network  for  learning  from  the 
training  data  set  may  prevent  a  correct  identification  of  the 
intrinsic  data  structure  and  an  accurate  estimation  of  the  class 
boundaries.  There  may  also  exist  the  mismatch  between  the 
data  structure  and  classifier  architecture  or  between  the  class 
boundaries  and  decision  boundaries.  Furthermore,  since  the 
machine  observer  and  human  observer  may  not  detect  the  same 
set  of  masses,  the  “black  box”  nature  of  most  CAD  systems  to 
the  clinical  users  will  prevent  a  natural  on-line  integration  of 
human  intelligence  and  further  upgrade  of  a  CAD  system.  An 
interactive  user  interface  should  be  considered  to  leverage  the 
complementary  roles  of  the  CAD  in  the  clinical  practice. 

As  a  step  toward  improving  the  performance  of  a  CAD 
system,  we  have  put  considerable  efforts  to  conduct  various 
studies  and  develop  reliable  image  enhancement  and  lesion  se¬ 
lection  techniques.  The  methods  and  results  have  been  reported 
in  [24],  where  the  purposes  of  the  research  were  to  localize  the 
potential  mass  sites  and  help  accurate  feature  extraction.  This 
paper  addresses  the  further  development  of  computer-assisted 
mass  detection  based  on  the  1)  construction  of  the  featured 
knowledge  database;  2)  mapping  of  the  classified  and/or  un¬ 
classified  data  points  in  the  database;  and  3)  development  of  an 
intelligent  user  interface  (lUl).  The  clinical  goal  is  to  eliminate 
the  false  positive  sites  that  correspond  to  normal  dense  tissues 
with  mass-like  appearances  through  featured  discrimination. 
We  adopt  a  mathematical  feature  extraction  procedure  to  con¬ 
struct  the  featured  knowledge  database  from  all  the  suspicious 
mass  sites  localized  by  the  enhanced  segmentation.  The  optimal 
mapping  of  the  data  points  is  then  obtained  by  learning  the 
generalized  normal  mixtures  and  decision  boundaries,  where  a 
probabilistic  modular  neural  network  (PMNN)  is  developed  to 
carry  out  both  soft  and  hard  clustering.  A  visual  explanation  of 
the  decision  making  is  further  invented  as  a  decision  support 
tool,  based  on  an  interactive  visualization  hierarchy  through  the 
probabilistic  principal  component  projections  of  the  knowledge 
database  and  the  localized  optimal  displays  of  the  retrieved  raw 
data.  The  motivation  of  this  work  comes  from  the  following 
considerations.  First,  though  both  human  and  machine  ob¬ 
servers  use  the  same  set  of  raw  data  in  the  diagnostic  stage,  the 
construction  of  the  knowledge  database  for  training  machine 
classifiers  and  that  accomplished  by  human  brains  are  indeed 
different.  Thus,  the  knowledge  database  should  be  established 
with  both  machine  and  expert  organized  representative  cases. 
Second,  a  quantitative  understanding  of  the  knowledge  database 
used  by  the  machine  observer  should  be  acquired  to  logically 
compare  and/or  predict  the  performance  of  CAD  systems  with 
respect  to  the  human  observers  without  possible  under-  or 
over-estimation,  and  to  optimize  the  feature  extraction  and 
design  of  the  machine  learner  for  best  final  performance. 
Finally,  since  the  human  and  machine  observers  indeed  take 
different  learning  and  intelligence  pathways,  an  lUI  should  be 
developed  to  visually  (e.g.,  transparently)  explain  the  entire 
internal  decision  making  process  of  the  CAD  system  to  the 
human  observer  to  enhance  the  clinical  decision  when  facing 
either  consistent  or  conflicting  opinions. 
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The  major  differences  between  our  work  and  the  previous 
work  [1]-[10]  are  as  follows. 

1)  We  construct  a  knowledge  database  by  combining  both 
expert  and  machine  selected  cases  where  the  assignment 
of  class  memberships  (e.g.,  mass  and  nonmass  classes)  is 
supervised  by  the  radiologists  or  pathological  report  after 
all  the  cases  are  collected. 

2)  We  impose  a  model  identification  procedure  to  determine 
the  optimal  number  and  kernel  shape  of  the  local  clus¬ 
ters  within  each  of  the  two  classes  in  a  high-dimensional 
feature  space.  The  model  is  then  estimated  using  the  ex¬ 
pectation-maximization  (EM)  algorithm  and  information 
theory. 

3)  We  develop  a  PMNN,  which  is  considered  as  a  nonlinear 
classifier,  to  carry  out  the  mapping  function  of  the  knowl¬ 
edge  database.  In  the  knowledge  database,  the  decision 
likelihood  boundaries  and  the  class  prior  probabilities  are 
determined  in  a  separate  fashion,  and  the  structure  of 
PMNN  is  optimized  by  adapting  to  the  database  structure. 

4)  We  derive  a  probabilistic  principal  component  projection 
scheme  to  reduce  the  dimensionality  of  the  feature  space 
for  natural  human  perception.  The  scheme  leads  to  a  hi¬ 
erarchical  visualization  algorithm  allowing  the  complete 
data  set  to  be  analyzed  at  the  top  level,  with  best  separated 
clusters  and  subclusters  of  data  points  analyzed  at  deeper 
levels. 

The  framework  of  the  proposed  method  for  mass  detection  is 
illustrated  in  Fig.  2.  A  detailed  description  of  this  paper  is  orga¬ 
nized  as  follows.  In  Section  II,  the  procedure  of  the  knowledge 
database  construction  is  described.  The  data  mapping  process 
for  decision  making  is  presented  in  Section  IE.  Section  IV 
presents  the  design  of  the  lUI  for  the  CAD  systems.  Finally, 
major  results  and  discussions  are  summarized  in  Section  V. 

n.  Knowledge  Database  Construction 

Given  the  available  information  contained  in  the  raw  data  of 
mass  sites  and  in  order  to  establish  machine  intelligence  carried 
out  by  various  machine  observers,  a  knowledge  database  may 
be  constructed  in  a  multidimensional  feature  space.  It  should  be 
emphasized  however  that  the  knowledge  acquired  by  the  human 
brain  uses  much  more  sophisticated  processes  than  the  artificial 
systems.  Though  feature  extraction  has  been  a  key  step  in  most 
pattern  analysis  tasks,  the  mathematical  procedures  are  often 
done  intuitively  and  heuristically.  The  general  guidelines  are: 

1)  Discrimination:  Features  of  patterns  in  different  classes 
should  have  significantly  different  values, 

2)  Reliability:  Features  should  have  similar  values  for  the 
patterns  of  the  same  class. 

3)  Independence:  Features  should  not  be  strongly  correlated 
to  each  other. 

4)  Optimality:  Some  redundant  features  should  be  deleted. 
A  small  number  of  features  is  preferred  for  reducing  the 
complexity  of  the  classifier. 

Many  useful  image  features  have  been  suggested  previously 
by  both  image  processing  and  pattern  analysis  communities 
[11]-[13],  These  features  can  be  divided  into  three  categories, 
namely,  intensity  features,  geometric  features,  and  texture 
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Fig.  2.  The  flow  diagram  of  mass  detection  in  digital  mammogranis. 


features,  whose  values  are  calculated  from  the  pixel  matrices 
of  the  regions  of  interest  (ROIs).  Though  these  features  are 
mathematically  well  defined,  they  may  not  be  complete  since 
they  cannot  capture  all  of  the  capable  aspects  of  human  per¬ 
ception  nature.  Thus,  in  this  study,  we  have  included  several 
additional  expert-suggested  features  to  reflect  the  radiologists* 
experience.  TTie  typical  features  are  summarized  in  Table  I, 
where  Fig.  3  shows  the  raw  image  of  corresponding  featured 
sites. 

The  joint  histogram  of  the  feature  point  distribution  extracted 
from  true  and  false  mass  regions  are  investigated,  and  the  fea¬ 
tures  that  can  better  separate  the  true  and  false  mass  regions 
are  selected  for  further  study.  Our  experience  has  suggested  that 
three  features,  i.e.,  the  site  area,  two  measured  compactness  (cir¬ 
cularity),  and  difference  entropy,  were  having  better  discrimi¬ 
nation  and  reliability  properties.  Their  definitions  are  given  as 
follows. 


TABLE  I 

The  Summary  of  Mathematical  Features 


Feature  Sub- Space 

Features 

A.  Intensity  Features 

1.  contrast  measure  of  ROIs; 

2.  standard  derivation  inside  ROIs; 

;L  mean  gradient  of  ROIs  boundary 

B.  Geometric  Features 

1.  area  measure; 

2.  circularity  measure; 

3.  deviation  of  the  normalized  radial  length; 
boundary  roughness; 

C.  Texture  Features 

L.  energy  measure; 

2.  correlation  of  co-occurrence  matrix; 

3.  inertia  of  co-occurrence  matrix; 

4.  entropy  of  co-occurrence  matrix; 

5.  inverse  difference  moment; 

0.  sum  average; 

7.  sum  entropy; 

S.  difference  entropy; 

0.  fractal  dimension  of  surface  of  ROI; 

1)  Compactness  1 


where  A  is  the  area  of  the  actual  suspected  region,  and 
Ai  is  the  area  of  the  overlapped  region  of  A  and  the  ef¬ 
fective  circle  Ac,  which  is  defined  as  the  circle  whose  area 
is  equal  to  A  and  is  centered  about  the  corresponding  cen¬ 
troid  of  A. 

2)  Compactness  2 


where  P  is  the  boundary  perimeter,  and  A  is  the  area  of 
region. 


3)  Difference  Entropy 

L-i 

DHd,9  =  -  X)  loSPx-y(^)  (3) 

A:=0 

where 

Px-y{k)  =  X  S  I*  " 

i=0  i=0 

Several  important  observations  are  worth  reiteration: 

1)  The  knowledge  database  that  will  be  used  by  the  CAD 
system  are  constructed  from  the  cases  selected  by  both 
lesion  localization  procedure  and  human  expert’s  experi¬ 
ence.  This  joint  set  provides  more  complete  knowledge  to 
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(C) 


Fig.  3.  One  example  of  mass  segmentation  and  boundary  extraction,  (a)  Mass 
patch;  (b)  segmentation;  (c)  boundary  extraction. 


the  machine  observer.  In  particular,  during  the  interactive 
decision  making,  CAD  system  can  still  provide  opinion 
when  the  cases  are  missed  by  the  localization  procedure 
but  presented  to  the  system  by  the  radiologists. 

2)  The  knowledge  database  is  defined  quantitatively  in  a 
high  dimensional  feature  space.  It  provides  not  only  the 
knowledge  for  training  the  machine  observer,  but  also  an 
objective  base  for  evaluating  the  quality  of  feature  extrac¬ 
tion  or  network’s  learning  capability,  and  the  on-line  vi¬ 
sual  explanation  possibility. 

3)  The  assignment  of  the  cases’  class  memberships  (e.g., 
mass  and  nonmass  classes)  is  supervised  by  the  radiolo¬ 
gists  or  pathological  reports.  A  complete  knowledge  data¬ 
base  includes  three  subsets:  raw  data  of  mass-like  sites, 
corresponding  feature  points,  and  class  membership  la¬ 
bels. 


ni.  Data  Mapping  For  Decision  Making 

The  decision  making  support  by  a  CAD  system  addresses  the 
problem  of  mapping  a  knowledge  database,  given  a  finite  set 
of  data  examples.  The  mapping  function  can  therefore  be  inter¬ 
preted  as  a  quantitative  representation  of  the  knowledge  about 
the  mass  lesions  contained  in  the  database  [14].  Instead  of  map¬ 
ping  the  whole  data  set  using  a  single  complex  network,  it  is 
more  practical  to  design  a  set  of  simple  class  subnets  with  local 
mixture  clusters,  each  one  of  which  represents  a  specific  region 
of  the  knowledge  space.  Inspired  by  the  principle  of  divide-and- 
conquer  in  applied  statistics,  PMNN  has  become  increasingly 
popular  in  machine  learning  research  [14],  [15],  [l9]-[22].  In 
this  section,  we  present  its  applications  to  the  problem  of  map¬ 
ping  from  databases  in  mass  detection,  with  a  constructive  cri¬ 
terion  for  designing  the  network  architecture  and  the  learning 
algorithm  that  are  governed  by  information  theory  [25]. 


A.  Statistical  Modeling 

The  quantitative  mapping  of  a  database  may  be  decomposed 
into  three  distinctive  learning  tasks:  the  detection  of  the  struc¬ 
ture  of  each  class  model  with  local  mixture  clusters;  the  estima¬ 
tion  of  the  data  distributions  for  each  induced  cluster  inside  each 
class;  and  the  classification  of  the  data  into  classes  that  realizes 
the  data  memberships.  Recently,  there  has  been  considerable 
success  in  using  finite  mixture  distributions  data  mapping  [15], 
[17],  [18],  [20].  Assume  that  the  data  points  Xi  in  a  multidimen¬ 
sional  database  come  from  M  classes  {d/i,  . . . ,  ,  cU.v/}, 

and  each  class  contains  Kr  clusters  , . . ,  ,  ^/vv}» 

where  cU,.  is  the  model  parameter  vector  of  class  ?•,  and  Of^  is  the 
kernel  parameter  vector  of  cluster  k  within  class  r.  The  class 
conditional  probability  measure  for  any  data  point  inside  the 
class  7*,  i.e.,  the  standard  finite  mixture  distribution  (SFMD),  can 
be  obtained  as  a  sum  of  the  following  general  form: 

Kr 

f{u\Q,.)  =  ^  (5) 

k=l 

where  with  a  summation  equal  to  one,  and 

is  the  kernel  function  of  the  local  cluster  distribution. 
For  the  model  of  global  class  distributions,  we  denote  the 
Bayesian  prior  for  each  class  by  P{C3r)>  Then  the  sufficient  sta¬ 
tistics  according  to  the  Bayes’  rule,  are  the  posterior  probability 
P{u}r\xi)  given  a  particular  observation  Xi 


where  p{xi) 


P{^i) 

E"im)/(XiPr). 


(6) 


B,  Class  Distribution  Learning 

Class  distribution  learning  ad^esses  the  combined  estima¬ 
tion  of  regional  parameters  (7rjk,  0^,)  and  detection  of  the  struc¬ 
tural  parameter  Kr  and  the  kernel  shape  of  ^(•)  in  (5)  based  on 
the  observations  X|..  One  natural  criterion  used  for  learning  the 
optimal  parameter  values  is  to  minimize  the  distance  between 
the  SFMD,  denoted  by  /r(u),  and  the  class  data  histogram,  de¬ 
noted  by  fjc^{u)  [17].  In  this  paper,  we  use  relative  entropy 
(Kullback-Leibler  distance),  suggested  by  information  theory 
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[25],  as  the  distance  measure  (for  simplicity  we  use  fr(u)  to 
denote  f{u\(^v)  in  our  formulation),  given  by 


u 


fxr.{u) 

/(■u|w,.)' 


(7) 


We  have  previously  shown  that  when  relative  entropy  is  used  as 
a  distance  measure,  the  distance  minimization  method  is  equiv¬ 
alent  to  the  soft-split  classification-based  method  under  the  cri¬ 
terion  of  maximum  likelihood  (ML)  [23]. 

Another  important  issue  concerning  unsupervised  distribu¬ 
tion  learning  is  the  detection  of  the  structural  parameters  of 
the  class  distribution,  called  model  selection  [15].  The  objec¬ 
tive  here  is  to  propose  a  systematic  strategy  for  determining  the 
optimal  number  and  kernel  shape  of  local  clusters,  when  the 
prior  knowledge  is  not  available.  This  is  indeed  the  case  when 
the  structure  of  the  mass  lesion  patterns  for  a  particular  type  of 
cancer  may  be  arbitrarily  complex,  so  correct  identification  of 
the  database  structure  is  veiy  important.  Thus,  it  will  be  desir¬ 
able  to  have  a  neural  network  structure  that  is  adaptive,  in  the 
sense  that  the  number  and  kernel  shape  of  local  clusters  are  not 
fixed  beforehand.  In  this  paper,  we  applied  two  popular  infor¬ 
mation  theoretic  criteria,  i.e.,  the  Akaike  information  criterion 
and  minimum  description  length  to  guide  the  model  selection 
procedure  [24]. 

As  the  counterpart  for  adaptive  model  selection,  there  are 
many  numerical  techniques  to  perform  ML  estimation  of  cluster 
parameters  [17].  For  example,  EM  algorithm  first  calculates  the 
posterior  Bayesian  probabilities  of  the  data  through  the  observa¬ 
tions  and  the  current  parameter  estimates  (E?-step)  and  then  up¬ 
dates  parameter  estimates  using  generalized  mean  ergodic  the¬ 
orems  (M-step),  The  procedure  cycles  back  and  forth  between 
these  two  steps.  The  successive  iterations  increase  the  likelihood 
of  the  model  parameters.  The  scheme  provides  winner-takes-in 
probability  (Bayesian  “soft”)  splits  of  the  data,  hence  allowing 
the  data  to  contribute  simultaneously  to  multiple  clusters.  For 
the  sake  of  simplicity,  we  assume  the  kernel  shape  of  local  clus¬ 
ters  to  be  a  multidimensional  Gaussian  with  mean  jlkr  and  vari¬ 
ance  Fjtr*  We  summarize  the  EM  algorithm  as  follows. 

1)  E-Step:  for  training  sample  t  =  1,  . . . ,  TV,  compute 
the  probabilistic  membership 


l("0 

^kr 


it)  = 


(8) 


2)  M-Step:  compute  the  updated  parameter  estimates 


iV 

N 

p(m+l)  _  1 

X  [f« 


(9) 

(10) 


(11) 


C.  Decision  Boundary  Learning 

The  objective  of  data  classification  is  to  realize  the  class 
membership  Ur  for  each  data  points  based  on  the  observation 
Xi  and  the  class  statistics  {P{oJr),  /(tTpr)}*  It  is  well  known 
that  the  optimal  data  classifier  is  the  Bayes  classifier  since 
it  can  achieve  the  minimum  rate  of  classification  error  [26]. 
Measuring  the  average  classification  error  by  the  mean  squared 
error  many  previous  researchers  have  shown  that  minimizing 
E  by  adjusting  the  parameters  of  class  statistics  is  equivalent  to 
directly  approximating  the  posterior  class  probabilities  when 
dealing  with  the  two  class  problem  [13],  [26].  In  general,  for  the 
multiple  class  problem  the  optimal  Bayes  classifier  (minimum 
average  error)  classifies  input  patterns  based  on  their  posterior 
probabilities:  input  Xi  is  classified  to  class  Ur  if 

P{Ur\Xi)  >  P{u}j\Xi)  (12) 

for  all  j  r.  It  should  be  noted  that  in  the  formulation  of  classi¬ 
fier  design,  the  optimal  criterion  used  for  the  future  data  classi¬ 
fication  has  been  intuitively  and  directly  applied  to  the  learning 
of  class  statistics  from  the  training  data  set. 

Direct  learning  of  posterior  probability  is  a  complex  task. 
Great  effort  has  been  made  in  designing  the  classifier  as  an 
estimator  of  the  posterior  class  probability  [19].  By  closely  in¬ 
vestigating  the  global  class  distribution  modeling,  we  found  that 
the  classifier  design  for  data  classification  can  be  dramatically 
simplified  at  the  learning  stage.  Revisit  (6),  since  the  class  prior 
probability  P{cJr)  is  a  known  parameter  when  a  supervised 
learning  is  applied,  the  posterior  class  probability  P{(jJr\xi)  can 
be  obtained  without  any  further  effort.  Thus,  by  conditioning 
P(Cjr),  the  problem  is  formulated  as  a  supervised  classification 
learning  of  the  class  conditional  likelihood  density  /(it|wr)* 
Thus,  an  efficient  supervised  algorithm  to  learn  the  class 
conditional  likelihood  densities  called  the  “decision-based 
learning”  [21]  is  adopted  in  this  paper.  The  decision-based 
learning  algorithm  uses  the  misclassified  data  to  adjust  the  den¬ 
sity  functions  f{u\ujr)y  which  are  initially  obtained  using  the 
unsupervised  learning  scheme  described  previously,  so  that  the 
minimum  classification  error  can  be  achieved.  Define  the  rth 
class  discriminant  function  <j>r{xi,  w)  to  be  P{(Xr)f{xi\uJr)- 
Given  a  set  of  training  patterns  X  =  {xj;  i  p  1,  2,  , . . ,  M}. 
The  set  X  is  further  divided  into  the  “positive  training  set” 
X"^  =  {xi\  Xi  e  Ur,  i  =  1,2,...,  N}  and  the  “negative 
training  set”  X“  =  {fi;  Xi  ^  oJ,.,  i  =  iV-hl,  N +2,  . . . ,  M}. 
If  the  misclassified  training  pattern  is  from  positive  training 
set,  reinforced  learning  will  be  applied.  If  the  training  pattern 
belongs  to  the  negative  training  set,  we  anti-reinforce  the 
learning,  i.e.,  pull  the  kernels  away  from  the  problematic 
regions.  The  boundary  refinement  is  summarized  as  follows: 

Reinforced 

Learning:  w) 

Antireinforced 

Learning:  =  y,U)  _  r]l\d{t))^<P{x{t),  w) 

(13) 

PMNN  is  a  probabilistic  modular  network  designed  espe¬ 
cially  for  data  classification  where  a  Bayesian  decomposition  of 
the  learning  process  provides  a  unique  opportunity  to  optimize 
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Fig.  4.  The  structure  of  the  PMNN. 


the  structure  of  training  scheme  [14],  [22],  Since  the  information 
about  class  population  is,  in  general,  physically  uncorrelated 
with  the  conditional  features  about  the  individual  class,  a  decou¬ 
pled  two-step  training,  in  terms  of  both  network  structure  and 
learning  rule,  makes  much  more  sense  than  that  in  the  conven¬ 
tional  posterior-type  neural  networks,  i.e.,  the  conditional  like¬ 
lihood  of  each  class  and  the  class  Bayesian  prior  should  be  ad¬ 
justed  separately  in  the  classification  spaces.  Thus,  PMNN  con¬ 
sists  of  several  disjoint  subnets  and  a  winner-takes-all  network. 
The  subnet  outputs  of  the  PMNN  are  designed  to  model  the  like¬ 
lihood  functions  (likelihood-type  network)  which  are  first  esti¬ 
mated  from  equally  presented  class  samples,  and  the  final  de¬ 
cision  boundaries  are  determined  simply  weighting  the  likeli¬ 
hood  by  the  class  populations.  For  a  M-classification  problem, 
PMNN  contains  M  different  class  subnets,  each  of  which  rep¬ 
resents  one  data  class  in  the  database.  Within  each  subnet,  sev¬ 
eral  neurons  (or  clusters)  are  applied  in  order  to  handle  prob¬ 
lems  which  have  complicated  decision  boundaries.  The  outputs 
of  class  subnets  are  fed  into  a  winner-take-all  network.  The 
winner-take-all  network  categorizes  the  input  pattern  to  the  data 
class  whose  subnet  produces  the  highest  output  value. 

The  structure  of  the  PMNN  used  in  this  study  is  shown  in 
Fig.  4.  The  PMNN  consists  of  two  subnets.  Within  each  subnet, 
there  are  several  neurons  (or  clusters).  The  outputs  of  class  sub¬ 
nets  are  fed  into  a  probability  winner  processor,  which  catego¬ 
rizes  the  input  pattern  to  the  data  class  whose  subnet  produces 
the  highest  probability  value.  The  training  scheme  of  the  PMNN 
is  based  on  the  unsupervised  learning.  Each  subnet  is  trained 
individually,  and  no  mutual  information  across  the  classes  may 
be  utilized.  In  our  study,  one  modular  expert  is  trained  to  de¬ 
tect  true  masses,  and  the  other  is  trained  to  detect  false  masses. 
After  training,  the  feature  vectors  extracted  from  ROIsub  are 
entered  to  this  network  to  classify  true  or  false  masses.  In  both 
training  and  testing  processes,  we  assume  that  the  feature  vec¬ 
tors  Xi  in  class  r  (r  =  1,  . . . ,  M)  is  a  mixture  of  multidimen¬ 
sional  Gaussian  distributions,  i.e., 

hr 

fiXilWr)  =  T^krPk{Xi\Qr)  (14) 

*=.1 


'^here  =  1  andpi(w,)  =  N{pk,;  Tkr)  is  a  multi- 

dimensional  Gaussian  distribution  within  cluster  k  of  class  r. 

IV.  Interactive  Visual  Explanation 

In  order  to  improve  the  utility  of  the  CAD  systems  in  clinical 
practice,  an  lUI  is  highly  desired.  Different  from  many  previ¬ 
ously  proposed  approaches,  we  have  organized  our  database 
from  both  mathematical-local ized  and  radiologist-selected 
mass-like  cases,  and  formed  the  featured  knowledge  database 
based  on  both  mathematical-based  and  radiologist-selected 
image  features.  This  off-line  effort  should  enhance  the  per¬ 
formance  of  the  machine  observer  through  better  quality  of 
training  set  and  optimal  design  of  neural  network  architecture. 
Our  experience  has  suggested,  however,  that  further  improve¬ 
ment  of  CAD  systems  requires  on-line  natural  integration  of 
human  intelligence  with  the  computer*  output,  since  human 
perception  has  and  can  play  an  important  role  in  the  clinical 
decision  making.  In  this  research,  we  have  pilot  developed  an 
lUI  where  the  major  functions  include:  1)  interactive  visual 
explanation  of  the  CAD  decision  making  process;  2)  on-line 
retrieval  of  the  optimally  displayed  raw  data  and/or  similar 
cases;  and  3)  supervised  upgrade  of  the  knowledge  database  by 
radiologist-driven  input  of  the  “unseen”  and/or  “typical”  cases. 
Our  preliminary  studies  have  shown  that  the  visual  presentation 
of  both  raw  data  and  CAD  results  to  radiologists  may  provide 
visual  cues  for  improved  decision  making. 

As  a  step  toward  understanding  the  complex  information 
from  data  and  relationships,  structural  and  discriminative 
knowledge  reveals  insight  that  may  prove  useful  in  data 
mining.  Hierarchical  minimax  entropy  modeling  and  proba¬ 
bilistic  principal  component  projection  are  proposed  for  data 
explanation,  which  is  both  statistically  principled  and  visually 
effective  at  revealing  all  of  the  interesting  aspects  of  the  data 
set.  The  methods  involve  multiple  use  of  standard  finite  normal 
mixture  models  and  probabilistic  principal  component  projec¬ 
tions.  The  strategy  is  that  the  top-level  model  and  projection 
should  explain  the  entire  data  set,  best  revealing  the  presence 
of  clusters  and  relationships,  while  lower-level  models  and 
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projections  should  display  internal  structure  within  individual 
clusters,  such  as  the  presence  of  subclusters  and  attribute  trends, 
which  might  not  be  apparent  in  the  higher-level  models  and 
projections.  With  many  complementary  mixture  models  and 
visualization  projections,  each  level  will  be  relatively  simple 
while  the  complete  hierarchy  maintains  overall  flexibility  yet 
still  conveys  considerable  structural  information.  In  particular, 
a  probabilistic  principal  component  neural  network  is  devel¬ 
oped  to  generate  optimal  projections,  leading  to  a  hierarchical 
visualization  algorithm.  This  algorithm  allows  the  complete 
data  set  to  be  analyzed  at  the  top  level,  with  best  separated 
subclusters  of  data  points  analyzed  at  deeper  levels. 

Research  evidence  suggests  that  for  analysis  of  complex  and 
high-dimensional  data  sets,  structure  decomposition  and  dimen¬ 
sionality  reduction  are  the  natural  strategies  in  which  the  model- 
based  approach  and  visual  explanation  have  proven  to  be  pow¬ 
erful  and  widely-applicable  [27].  However,  there  is  a  trade-off 
between  maximizing  (structure  decomposition)  and  minimizing 
(dimensionality  reduction)  the  entropy  of  the  system.  In  this 
research,  a  minimax  entropy  approach  is  adopted  through  the 
use  of  progressive  model  identification  and  principal  compo¬ 
nent  projection.  The  complete  visual  explanation  hierarchy  is 
generated  by  performing  principal  projection  (dimensionality 
reduction)  and  model  identification  (structure  decomposition) 
in  two  iterative  steps  using  information  theoretic  criteria,  EM  al¬ 
gorithm,  and  probabilistic  principal  component  analysis  (PC A). 
Hierarchical  probabilistic  principal  component  visualization  in¬ 
volves:  1)  evaluation  of  posterior  probabilities  for  mixture  data 
set;  2)  estimation  of  multiple  principal  component  axes  from 
probabilistic  data  set;  and  3)  generation  of  a  complete  hierarchy 
of  visual  projections. 

Suppose  the  data  space  is  (/-dimensional  with  coordinates 
2/1,  . . . ,  yd  and  the  data  set  consists  of  a  set  of  d-dimensional 
vectors  {tj}  where  i  =  1,  . . . ,  iV.  Now  consider  a  three-di¬ 
mensional  (3-D)  latent  space  x  =  (xi,  X2,  together  with 
a  linear  function  which  maps  the  latent  space  to  the  data  space  by 
y  =  Wx+b  where  W  is  a  dx  3  matrix  and  b  is  a  d-dimensional 
mean  vector.  If  we  introduce  a  probability  distribution  p(x)  over 
the  latent  space  given  by  a  Gaussian  estimated  from  the  latent 
variables  {x^},  then  a  similar  full-dimensional  Gaussian  distri¬ 
bution  in  data  space  can  be  defined  by  convolving  this  distri¬ 
bution  with  a  general  diagonal  Gaussian  conditional  probability 
distribution  p(t  I X,  Ad)  in  data  space  where  Ad  is  the  covariance 
matrix,  resulting  in  a  final  form  of 

P(t)  =  J  P(t|x)p(x)<ix  (15) 

where  the  log  likelihood  function  for  this  model  is  given  by  L  = 
Si  Suppose  W  is  determined  by  the  PCA,  ML  can  be 

used  to  fit  the  model  to  the  data  and  hence  determine  values  for 
the  parameters  b  and  Ad  [27].  Using  a  soft  clustering  of  the  data 
set  and  multiple  PCAsub  corresponding  to  the  clusters,  a  mix¬ 
ture  of  latent  models  takes  the  form  of  p(t)  =  ^A;P(tlA:) 
where  Kq  is  the  number  of  components  in  the  mixture,  and  the 
parameters  ttj^  are  the  prior  probabilities  corresponding  to  the 
components  p(t|A:).  Each  component  is  an  independent  latent 
model  with  PCA  projection  Wjt  and  parameters  b^  and  Adfc. 
This  procedure  can  be  further  extended  to  a  hierarchical  mix¬ 
ture  model  formulated  by  p(t)  =  Sj  j) 


Fig.  5.  The  hierarchical  view  of  computed  features  for  mass  and  nonmass 
samples  (Database  A,  see  Table  11). 

where  p{t\k.  j)  again  represent  independent  latent  models  [27]. 
With  a  soft  partitioning  of  the  data  set  via  EM  algorithm,  data 
points  will  effectively  belong  to  more  than  one  cluster  at  any 
given  level.  This  step  is  automatically  available  in  our  approach 
since  the  estimation  of  parent  latent  model  involves  the  calcula¬ 
tion  of  posterior  probabilities  denoted  by  zik.  Thus,  the  effective 
input  values  are  Zih'x.i  for  an  independent  visualization  space 
k,  corresponding  to  the  visualization  space  k  in  the  hierarchy. 
It  should  be  emphasized  that  probabilistic  means  both  neural 
network  based  learning  and  posterior  probability  weighted  in¬ 
puts.  Further  projections  can  again  be  performed  by  using  the 
effective  input  values  ZikZj\kti  for  the  visualization  subspace 
y .  Fig.  5  shows  the  hierarchical  view  of  computed  features  for 
mass  and  nonmass  samples.  In  Fig.  5,  a  hierarchical  visualiza¬ 
tion  view  of  a  high  dimensional  feature  data  set  was  gener¬ 
ated  using  hierarchical  data  visualization  algorithm.  One  hun¬ 
dred  and  25  real  cases  were  involved,  among  them  75  are  mass 
sites,  50  are  nonmass  sites.  Nine  features  were  computed  on  125 
cases.  The  dimension  of  the  resulted  feature  data  set  became  125 
X  9  (Database  A,  see  Table  H).  Hierarchical  visualization  tool 
enables  the  visualization  of  high  dimensional  data  set  through 
dimension  reduction  and  data  modeling  so  that  data  distribution 
features  of  the  data  set  can  be  well  recognized.  For  instance,  the 
clusters  and  subclusters  of  mass  and  nonmass  data  points  and  the 
boundaries  of  the  clusters  can  be  revealed  for  further  research 
purpose. 

In  the  use  of  a  hierarchical  minimax  entropy  mixture  model, 
an  interactive  visualization  environment  is  required  to  enable  a 
flexible  computerized  experiment  such  that  a  human-database 
interaction  can  be  performed  effectively.  We  have  developed  an 
interactive  environment  for  visualizing  five-dimensional  (5-D) 
data  sets,  based  on  state-of-the-art  computer  graphics  toolkits 
such  as  object-oriented  OpenGL  and  Openinventor.  With  a 
sophisticated  set  of  various  kinds  of  simulated  lights,  color 
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TABLE  11 

The  Summary  of  Experimental  Databases 


Database 

Descriptions 

A 

Nine  features  extracted  from  75  mass  sites  and  50  non-mass  sites.  Used  for 
visualizing  hierarchically  projected  high  dimi^nsional  feature  space. 

Result  is  presented  in  Figure  5. 

B 

A  simulated  two-dimensional  feature  space.  Used  to  show  the  eftect  of 
model  selection,  on  decision  boundary  estimation.  Result  is  shown  in  Figure  6. 

C 

ORLr  standard  database.  Used  to  show  the  improvement  of  PMNN  with 
decision-based  learning.  Result  is  discussed  in  the  text. 

D 

ihe  training  data  set  consisting  of  50  mammograms,  with  50  true  mass  sites 
and  50  false  mass  sites.  Three  most  discriminatory  features  are  extracted.  Used 
for  both  PMNN  training  and  visualization.  Result  is  given  in  Figure  7 

E 

ihe  testing  data  set  consisting  of  46  mammograms,  with  23  normal  cases 
and  23  biopsy  proven  mass  cases  with  each  of  them  having  at  le.'ist  one 
true  mass  site.  Three  most  discriminatory  features,  the  same 
as  database  D,  are  extracted.  Used  to  test  the  overall  performance  of 
our  CAD  system  prototype  where  the  mass  candidates  were  selected  using 
the  method  reported  in  Part  I,  automatically.  Result 
is  shown  in  Figure  8  and  also  discussed  in  tlie  text. 

texturing  editors,  and  3-D  manipulator  and  viewers  (we  have 
integrated  3-D  mouse  and  stereo  glass  units  into  our  existing 
system),  our  system  allows  one  to  examine  the  volumetric 
data  sets  with  any  viewpoint  and  dynamically  walk  through  its 
internal  structures  to  better  understand  the  spatial  relationships 
among  clusters  and  decision  surfaces  present.  One  of  the  most 
important  features  in  our  approach  is  to  attach  the  decision  sur¬ 
face  to  the  3-D  probability  cloud  in  support  of  decision  making, 
and  to  link  each  data  point  in  the  visualization  space  to  its  raw 
data  so  that  the  user  can  on-line  retrieve  the  corresponding  raw 
data  such  as  an  original  image  for  interim  decision  making. 

V.  Experimental  Results  and  Discussions 

In  this  section,  we  present  the  experimental  results  using  the 
information  theoretic  criteria  and  PMNNs  to  generate  the  map¬ 
ping  function  of  the  featured  database,  and  the  preliminary  re¬ 
sults  using  the  hierarchical  minimax  entropy  projections  to  con¬ 
duct  visual  explanation  of  the  decision  making.  For  the  valida¬ 
tion  of  the  database  mapping  using  the  proposed  algorithms, 
global  relative  entropy  (GRE)  value  between  the  (SFMD)and 
the  joint  histogram  is  used  as  an  objective  measure  to  evaluate 
the  fitness  of  the  mapping  function.  A  summary  of  the  databases 
we  used  in  our  study  is  presented  in  Table  H. 

As  we  have  discussed  in  Sections  III  and  TV,  model  selection 
is  the  first  and  a  very  important  learning  task  in  mapping  a 
database  and  the  objective  of  the  procedure  is  to  determine 
both  the  number  and  the  kernel  shape  of  local  clusters  in  each 
class.  This  procedure  is  used  not  only  in  the  data  mapping  for 
decision  making  but  also  in  the  structure  decomposition  for 
hierarchical  visual  explanation.  Our  experience  has  suggested 
that  an  incorrect  model  selection  will  affect  the  performance 
of  data-classification  based  decision  making.  For  the  sake  of 
simplicity,  we  discuss  this  conclusion  in  the  following  2-D 
example.  Let  us  form  a  simulated  featured  database  with  two 
major  features  that  well  characterize  the  two  targeted  classes, 
as  it  shown  in  Fig.  6  (Database  B,  see  Table  H).  The  ground 
truth  is  that  class  1  contains  only  one  local  cluster  while  class  2 
contains  two  local  clusters.  With  a  model  selection  procedure 


using  the  proposed  criteria,  the  intrinsic  data  structure  was 
correctly  identified.  According  to  the  principle  of  designing  the 
optimal  structure  of  PMNN  and  visual  explanation  hierarchy, 
the  result  of  these  criteria  also  determines  the  most  appropriate 
number  of  mixture  components  in  the  corresponding  PMNN 
and  projected  cluster  decomposition.  Two  PMNN  with  different 
architecture  orders  were  designed  and  trained  to  determine 
the  classification  boundaries  between  the  two  classes.  The 
classification  results  are  shown  in  Fig.  6(a)  and  (b).  The  result 
in  Fig.  6(a)  is  with  the  right  cluster  number  in  Class  2,  while 
the  result  in  Fig.  6(b)  is  with  the  wrong  cluster  number  in 
Class  2.  From  this  simple  experiment,  we  have  shown  that 
the  decision  boundary  with  the  right  cluster  number  may  be 
much  more  accurate  than  that  with  heuristically  determined 
cluster  number,  since  the  decision  boundary  between  class  1 
and  class  2  will  be  determined  by  four  cross  points  in  the  first 
case  while  in  the  second  case  the  decision  boundary  will  be 
determined  by  only  two  cross  points.  It  should  be  emphasized 
that  the  error  of  data  classification  is  theoretically  controlled 
by  the  accuracy  in  estimating  the  decision  boundaries  between 
classes,  and  the  quality  of  the  boundary  estimates  is  indeed 
dependent  upon  the  correct  structure  of  the  class  likelihood 
function. 

As  we  have  discussed  before,  although  the  knowledge 
database  contains  both  machine-localized  and  human-selected 
cases,  in  clinical  settings  “unseen’'  and/or  subtle  cases  con¬ 
tribute  the  major  false  positives.  We  have  also  pilot  tested  the 
PMNN  method  to  the  so-called  “A/  -f  1  classes”  problem, 
in  which  the  disease  pattern  under  testing  could  be  either 
from  one  of  the  M  classes,  or  from  some  other  unknown 
classes  (the  “unknown”  class  or  the  “intruder”  class).  Note  that 
the  unknown  class  probability  is  often  very  hard  to  estimate 
because  of  the  lack  of  sufficient  training  samples  (for  example, 
in  the  mass  detection  problem,  the  unknown  classes  include  the 
ROIsub  over  the  normal  tissues).  In  our  experiment,  PMNN 
uses  different  decision  rule  from  that  of  the  “M  classes” 
problem:  pattern  x*  belongs  to  class  r  if  both  of  the  following 
conditions  are  true:  a)  Xi)  >  Xi),  Vj  ^  r,  and  b) 
^(cUr,  Xi)  >  T.  T  is  a  threshold  obtained  by  decision-based 
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Fig.  6.  The  classification  examples  with  a  two-dimensional  (2-D)  simulated  database  (Dabase  B,  see  Table  II).  (a)  Class  2  contains  two  local  clusters,  (b)  Class 
2  contains  one  local  cluster. 


learning.  Otherwise  pattern  Xi  belongs  to  the  unknown  class. 
We  observed  consistent  and  significant  improvement  in  classifi¬ 
cation  results  compared  with  the  pure  Bayesian  decision.  Using 
the  ORL  (Olivetti  Research  Laboratory,  Cambridge,  U.K.) 
standard  database  (Database  C,  see  Table  H),  our  experience 
has  shown  an  increase  of  correct  detection  rate  from  70%  to 
90%  [14]. 

In  the  third  experiment,  we  use  the  proposed  classifier  to  dis¬ 
tinguish  true  masses  from  false  masses  based  on  the  features 
extracted  from  the  suspected  regions.  The  objective  is  to  reduce 
the  number  of  suspicious  regions  and  identify  the  true  masses. 
150  mammograms,  each  of  them  contains  at  least  one  mass 
case  of  varying  size  and  location,  were  selected  in  our  study. 
The  areas  of  suspicious  masses  were  identified  following  the 
proposed  procedure  with  biopsy  proven  results.  Fifty  mammo¬ 
grams  with  biopsy  proven  masses  were  selected  from  the  150 
mammograms  for  training  (Database  D,  see  Table  11).  The  mam¬ 
mogram  set  used  for  testing  contained  46  single-view  mam¬ 
mograms:  23  normal  cases  and  23  with  biopsy  proven  masses 
(Database  E,  see  Table  11)  which  were  also  selected  from  the  150 
mammograms.  All  mammograms  were  digitized  with  an  image 
resolution  of  100  fim  x  100  /xm/pixel  by  the  laser  film  digitizer 
(Model:  Lumiscan  150).  The  image  sizes  are  1792  x  2560  x 
12  bpp.  For  this  study,  we  shrunk  the  digital  mammograms  with 
the  resolution  of  400  ^m  by  averaging  4x4  pixels  into  one 
pixel.  According  to  radiologists,  the  size  of  the  small  masses  is 
3-15  mm.  The  middle  size  of  masses  is  15-30  mm.  The  large 
size  of  masses  is  30-50  mm,  which  are  rare  in  mammograms. 
A  3-mm  object  in  an  original  mammogram  occupies  30  pixels 
in  a  digitized  image  with  a  100-Atm  resolution.  After  reducing 
the  image  size  by  four  times,  the  object  will  occupy  the  range 
of  about  seven  to  eight  pixels.  The  object  with  the  size  of  seven 
pixels  is  expected  to  be  detectable  by  any  computer  algorithm. 


Therefore,  the  shrinking  step  is  applicable  for  mass  cases  and 
can  save  computation  time. 

After  the  segmentation,  the  area  index  feature  was  first  used 
to  eliminate  the  nonmass  regions.  In  our  study,  we  set  Ai  = 
7x7  pixels  and  A2  =  75  x  75  pixels  as  the  thresholds.  Ai 
corresponds  to  the  smallest  size  of  masses  (3  mm),  and  an  ob¬ 
ject  with  a  area  of  75  x  75  pixels  corresponds  to  30  mm  in  the 
original  mammogram.  This  indicates  that  the  scheme  can  de¬ 
tect  all  masses  with  sizes  up  to  30  mm,  Masses  larger  than  30 
mm  are  rare  cases  in  the  clinical  setting.  When  the  segmented 
region  satisfied  the  condition  Ai  <  A  <  A2,  the  region  was 
considered  to  be  suspicious  for  mass.  For  the  purpose  of  repre¬ 
sentative  demonstration,  we  have  selected  a  3-D  feature  space 
consisting  of  compactness  I,  compactness  H,  and  difference  en¬ 
tropy,  According  to  our  investigation,  these  three  features  have 
the  better  separation  (discrimination)  between  the  true  and  false 
mass  classes.  It  should  be  noticed  that  the  feature  vector  can 
easily  extend  to  higher  dimensionality.  A  training  feature  vector 
set  was  constructed  from  50  true  mass  ROIsub  and  50  false  mass 
ROIsub  (Database  D,  see  Table  H).  The  training  set  was  used  to 
train  two  modular  probabilistic  decision-based  neural  networks 
separately.  In  addition  to  the  decision  boundaries  recommended 
by  the  computer  algorithms,  a  visual  explanation  interface  has 
also  been  integrated  with  3-D  to  2-D  hierarchical  projections. 
Fig.  7(a)  shows  the  database  map  projection  with  compactness 
definition  I  and  difference  entropy.  Fig.  7(b)  shows  the  data¬ 
base  map  projection  with  compacmess  definition  II  and  differ¬ 
ence  entropy.  Our  experience  has  suggested  that  the  recogni¬ 
tion  rate  with  compactness  I  are  more  reliable  than  that  with 
compactness  II.  In  order  to  have  more  accurate  texture  informa¬ 
tion,  the  computation  of  the  second-order  joint  probability  ma- 
trix  Pd,9ii,  j)  is  only  based  on  the  segmented  region  of  the  orig¬ 
inal  mammogram.  For  the  shrunk  mammograms,  we  found  that 
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Fig.  8.  One  example  of  the  mass  detection  using  the  proposed  approach  (Database  E.  see  Table  II). 


the  difference  entropy  had  better  discrimination  with  d  =  1 .  The 
difference  entropy  used  in  this  study  was  the  average  of  values 
at  e  =0°,  45°,  90“,  and  135°. 

We  have  conducted  a  preliminary  study  to  evaluate  the  per¬ 
formance  of  the  algorithms  in  real  case  detection,  in  which  6-15 
suspected  masses/mammogram  were  detected  and  required  fur¬ 
ther  clinical  decision  making.  We  found  that  the  proposed  clas¬ 
sifier  can  reduce  the  number  of  suspicious  masses  with  a  sensi¬ 
tivity  of  84%  at  1.6  false  positive  findings/mammogram  based 
on  the  testing  data  set  containing  46  mammograms  (23  of  them 


have  biopsy  proven  masses)  (Database  E,  see  Table  11).  Fig.  8 
shows  a  representative  mass  detection  result  on  one  mammo¬ 
gram  with  a  stellate  mass.  After  the  enhancement,  ten  regions 
with  brightest  intensity  were  segmented.  Using  the  area  crite¬ 
rion,  too  large  and  too  small  regions  were  eliminated  first  and 
the  rest  regions  were  submitted  to  the  PMNN  for  further  eval¬ 
uation.  The  results  indicated  that  the  stellate  mass  lesion  was 
correctly  detected. 

For  further  evaluation,  receiver  operating  characteristic 
(ROC)  method  may  be  employed.  However,  we  do  not  feel 
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ROC  analysis  will  provide  really  a  better  evaluation  but  an 
alternative  method  to  this  case.  First,  most  ROC  analysis 
reported  by  others  were  based  on  different  database  thus  are 
not  comparable  since  ROC  results  are  highly  data-dependent. 
Second,  ROC  analysis  only  indicate  an  “overall”  performance 
with  limitations  at  least  in  twofold:  it  is  for  multithreshold  thus 
the  corresponding  system  may  not  be  optimal  to  a  particular 
application  where  only  one  threshold  is  needed;  and  it  cannot 
provide  a  mathematically  traceable  feedback  to  improve  the 
performance  of  the  system  or  the  one  component  in  the  system. 
Third,  currently  used  FROC  analysis  package  imposes  several 
assumptions  on  the  distributions  of  the  cases  which  are  invalid 
in  most  applications  and  particularly  untrue  in  our  situation.  For 
example,  our  assumptions  about  the  data  distributions  is  SFNM 
that  is  clearly  different  from  the  restricted  conditions  imposed 
by  the  application  of  existing  FROC  analysis  algorithm.  In  our 
approach,  a  quantitative  mapping  of  the  knowledge  database 
is  performed  with  hierarchical  SFMD  modeling  and  should 
be  perfectly  (at  least  in  the  theoretical  sense)  carried  out  by 
the  corresponding  PMNN  classifier.  In  other  words,  optimal 
decision  making  should  have  already  been  achieved  according 
to  the  Bayesian  rule.  It  is  reasonable  to  acknowledge  that 
in  order  to  compare  the  overall  performance  with  the  other 
systems,  an  ROC  study  may  be  further  conducted.  We  are 
currently  working  on  developing  a  new  generation  of  FROC 
analysis  package  with  a  caution  to  remove  the  forementioned 
problems. 

Another  important  consideration  with  the  present  approach 
is  the  measure  of  quality  in  visual  explanation  [29].  This  is  not 
a  glamorous  area,  but  progress  in  this  area  is  eminently  critical 
to  the  future  success  of  visual  exploration  [28].  What  is  the  cor¬ 
rect  matrix  for  a  direct  projection  of  a  particular  multimodal  data 
set?  How  effective  was  a  particular  visualization  tool?  Did  the 
user  come  to  the  correct  conclusion?  It  may  be  agreeable  that 
the  benchmark  criteria  in  visual  exploration  are  very  different 
and  difficult  [28].  As  shared  by  Bishop  and  Tipping  [27],  we 
believe  that  in  data  visualization  there  is  no  objective  measure 
of  quality,  and  so  it  is  difficult  to  quantify  the  merit  of  a  partic¬ 
ular  data  visualization  technique,  and  the  effectiveness  of  such 
a  techniques  is  often  highly  data-dependent.  The  possible  alter¬ 
native  is  to  perform  a  rigorous  psychological  evaluation  using 
simple  and  controlled  environment,  or  to  invite  domain  experts 
to  direct  evaluate  the  efficacy  of  the  algorithm  for  a  specified 
task.  For  example,  we  can  compare  the  domain  expert’s  perfor¬ 
mances  with  and  without  the  system  aid.  In  that  case,  the  ROC 
method  may  be  used  to  evaluate  the  performance  of  our  algo¬ 
rithm  when  used  by  the  radiologists.  While  the  optimality  of 
these  new  techniques  is  often  highly  data-dependent,  we  would 
expect  the  hierarchical  visualization  model  to  be  a  very  effective 
tool  for  the  data  visualization  and  exploration  in  many  applica¬ 
tions. 

In  summary,  we  employed  a  mathematical  feature  extraction 
procedure  to  construct  the  featured  knowledge  database  from 
all  the  suspicious  mass  sites  localized  by  the  enhanced  segmen¬ 
tation.  The  optimal  mapping  of  the  data  points  was  then  ob¬ 
tained  by  learning  the  generalized  normal  mixtures  and  decision 
boundaries.  A  visual  explanation  of  the  decision  making  was 
further  invented  as  a  decision  support,  based  on  an  interactive 


visualization  hierarchy  through  the  probabilistic  principal  com¬ 
ponent  projections  of  the  knowledge  database  and  the  localized 
optimal  displays  of  the  retrieved  raw  data.  A  prototype  system 
was  developed  and  pilot  tested  to  demonstrate  the  applicability 
of  this  framework  to  mammographic  mass  detection. 
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Abstract  Three  neural  network  models  were  employed  to  evaluate  their  performances  in  the  recognition  of 
medical  image  patterns  associated  with  lung  cancer  and  breast  cancer  in  radiography.  The  first  method  was  a  pattern 
match  neural  network.  The  second  was  a  conventional  backpropagation  neural  network.  The  third  method  was  a 
backpropagation  trained  neocognitron  in  which  the  signal  propagation  is  operated  with  the  convolution  calculation 
from  one  layer  to  the  next.  In  the  convolution  neural  network  (CNN)  experiment,  several  output  association  methods 
and  trainer  imposed  driving  functions  in  conjunction  with  the  convolution  neural  network  are  proposed  for  general 
medical  image  pattern  recognition.  An  unconventional  method  of  applying  rotation  and  shift  invariance  is  also  used 
to  enhance  the  performance  of  the  neural  nets. 

We  have  tested  these  methods  for  the  detection  of  microcalcifications  on  mammograms  and  lung  nodules  on  chest 
radiographs.  Pre-scan  methods  were  previously  described  in  our  early  publications.  The  artificial  neural  networks 
act  as  final  detection  classifiers  to  determine  if  a  disease  pattern  is  presented  on  the  suspected  image  area.  We  found 
that  the  convolution  neural  network,  which  internally  performs  feature  extraction  and  classification,  achieves  the 
best  performance  among  the  three  neural  network  models.  These  results  show  that  some  processing  associated  with 
disease  feature  extraction  is  a  necessary  step  before  a  classifier  can  make  an  accurate  determination. 


1.  Introduction 

Clinical  studies  in  the  use  of  chest  radiographs  for  the 
detection  of  lung  nodules  including  those  reported  by 
Stitik  [1]  and  Heelan  [2]  have  demonstrated  that  even 
highly  skilled  and  highly  motivated  radiologists,  task- 
directed  to  detect  any  finding  of  suspicion  for  a  pul¬ 
monary  nodule,  and  working  with  high  quality  chest 
radiographs,  still  fail  to  detect  more  than  30  percent  of 
the  lung  cancers  that  can  be  detected  retrospectively.  In 
the  series  reported  by  Stitik,  many  of  the  missed  lesions 
would  be  classified  as  TlNxMx  lesions,  the  stage  of 
non-small  cell  lung  cancer  that  C.  Mountain  indicates 
has  the  best  prognosis  (42%,  5  year  survival)  [3].  This 
is  the  stage  (nodules  0.3-2  cm  in  diameter,  separate 
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from  the  hilum)  of  lung  cancer  that  a  computer-assisted 
diagnostic  program  should  tackle.  Figure  1  shows  a 
chest  radiograph  containing  a  nodule  overlapped  by  a 
rib.  This  is  a  rather  typical  case,  because  40%  of  the 
lungs  are  covered  by  posterior  ribs  or  rib  crossings. 

Although  mammography  has  a  high  sensitivity  for 
detection  of  breast  cancers  when  compared  to  other  di¬ 
agnostic  modalities,  studies  indicate  that  radiologists 
do  not  detect  all  carcinomas  that  are  visible  in  retro¬ 
spective  analyses  of  the  images  [4-6].  These  missed 
detections  are  often  a  result  of  the  very  subtle  nature  of 
the  radiographic  findings.  However,  many  missed  di¬ 
agnoses  can  be  attributed  to  human  factors  such  as  sub¬ 
jective  or  varying  decision  criteria,  distraction  by  other 
image  features,  or  simple  oversight  [7,  8].  Early  breast 
cancers  are  often  characterized  by  masses  and  clus¬ 
tered  microcalcifications  [9].  It  has  been  reported  that 


Figure  L  A  chest  radiograph  showing  a  nodule  overlapped  on  a  rib. 


between  40%  and  50%  of  breast  carcinomas  detected 
radiographically  demonstrate  masses  on  mammograms 
[10,  11];  30-50%  of  breast  carcinomas  presented  as 
microcalcifications,  and  60-80%  of  breast  carcinomas 
reveal  microcalcifications  upon  histologic  examina¬ 
tions  [11-13].  Breast  cancer  patterns  associated  with 
masses  will  be  discussed  in  our  future  papers.  Breast 
cancer  associated  clustered  microcalcifications  are  one 
of  two  disease  objects  studied  in  this  paper.  Typically, 
the  sizes  of  microcalcifications  vary  from  0.16  mm  to 
1 .0  mm.  Figure  2  shows  a  mammogram  containing 
clustered  microcalcifications  which  are  surrounded  by 
dense  glandular  tissues. 

Various  computer-based  image  perception  tech¬ 
niques  have  been  proposed  for  the  detection  of  disease 
patterns  [14, 15].  With  each  of  these  methods  there  is  a 
trade-off  between  increased  sensitivity  and  decreased 
specificity.  In  general,  by  setting  less  stringent  criteria 
on  computer  algorithms,  the  sensitivity  of  the  detect¬ 
ing  programs  can  be  increased.  However,  when  using 
any  of  these  methods  to  detect  subtle  diseases,  we  must 
use  additional  methods  to  decrease  the  number  of  false 
positives.  For  this  reason,  several  investigators  have 
attempted  to  use  various  advanced  image  processing 
and  artificial  classifiers  to  improve  disease  detection 
[16-18]. 


Figure  2.  A  mammogram  showing  clustered  microcalcifications. 
The  cluster  area  in  original  size  is  enhanced  by  a  local  histogram 
equalization  process  for  display  purposes. 


Many  artificial  neural  network  models  have  recently 
been  applied  to  diagnostic  imaging  research  [19,  20]. 
The  main  tasks  of  these  research  efforts  are  aimed 
at  assisting  radiologists  either  in  the  accuracy  im¬ 
provement  of  quantitative  measures  or  in  the  improve¬ 
ment  of  sensitivity  and  specificity  for  a  disease  de¬ 
tection.  In  diagnostic  imaging,  the  neural  network 
techniques  incorporated  with  image  processing  meth¬ 
ods  have  become  a  major  research  trend  in  the  field 
of  computer-aided  diagnosis.  Medical  diagnoses  in¬ 
volve  very  sophisticated  decision-making  processes. 
We,  therefore,  limited  our  studies  to  the  recognition 
of  specific  disease  patterns.  In  this  paper,  we  will  also 
discuss  characteristics  of  some  disease  patterns  in  clini¬ 
cal  images  and  their  implications  on  the  neural  network 
classifications. 


2.  Materials  and  Research  Objectives 

2.  /.  Disease  Patterns  on  Projection  X-Ray  Images 

Projection  radiographs  shown  on  films  are  generated 
by  the  transmission  of  X-ray  beams  through  a  patient. 
The  resulting  X-rays,  of  varying  intensity,  form  a  radio- 
graphic  image.  For  many  years,  this  technique  has  been 
used  as  a  diagnostic  procedure  for  screening  or  primary 
examination  of  a  disease  associated  with  physical  tissue 
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changes.  The  major  drawback  of  projection  radiogra¬ 
phy  is  that  X-ray  beams  project  the  original  anatomical 
three-dimensional  objects  onto  a  two-dimensional  im¬ 
age.  In  other  words,  each  pixel  intensity  on  the  image 
represents  a  total  X-ray  attenuation  integrated  from  a 
line  passing  through  the  patient.  Bone  and  soft  tis¬ 
sue,  and  abnormal  changes  of  tissue  can  be  distin¬ 
guished  from  one  another  in  an  X-ray  image  because 
they  attenuate  X-rays  differently.  However,  subtle  ab¬ 
normalities  superimposed  on  various  normal  tissues 
and  bones  are  difficult  to  discern.  The  degree  of  so¬ 
phistication  in  recognition  of  disease  patterns  in  these 
images,  which  requires  professional  training,  differs 
significantly  from  that  of  the  character  recognition  or 
other  image  pattern  recognition.  The  degree  of  diffi¬ 
culty  is  not  easy  to  measure.  Qualitatively  speaking, 
the  ratio  of  signal  and  structure  noise  in  the  task  of  dis¬ 
ease  pattern  recognition  can  be  very  small.  Consider  a 
local  suspected  area  that  may  or  may  not  contain  a  dis¬ 
ease  pattern,  s{x,  y)  ^  d{x,  y)  e  Pd,  where  d(x,  y) 
represents  an  image  patch  that  has  been  proven  to  be 
a  disease  pattern.  The  collected  set  of  these  proven 
patches  is  called  Pd‘  This  local  area  often  contains 
some  background  information  resulting  from  normal 
tissues,  b{x,  y)  e  B.  The  total  intensity  function  de¬ 
noted  as  /(x,  y)  is  given  by 

/(x,  y)  =  six,  y)  +  bix,  y).  (1) 

In  general,  four  situations  are  possible  in  a  suspected 
area: 

(a)  six,  y)  »  bix,  y)  (i.e.,  high  signal  to  background 
ratio)  representing  obvious  true  cases; 

(b)  s(x,  y)  «  b(x,  y)  representing  subtle  cases, 

(c)  s{x,y)  =  Oandfe(jc, }') issiniilartooneof<f(Ar, y), 
where  d  e  Pd,  and 

(d)  s  (x ,  y)  =  0  and  fe  (jc ,  y)  is  not  similar  to  any  disease 
patterns,  representing  obvious  false  cases. 

Most  cases  falling  in  situation  (b)  result  in  true- 
negatives.  Cases  associated  with  situation  (c)  may  pro¬ 
duce  a  false-positive  by  a  classifier. 

Pattern  match  and  backpropagation,  two  commonly 
used  pattern  classifiers,  were  employed  to  compare  the 
performance  in  the  detection  of  clustered  microcalcifi¬ 
cations  selected  from  manunograms  and  the  detection 
of  lung  nodules  extracted  from  chest  radiographs.  Re¬ 
gions  of  interest  (ROI),  formatted  at  32  x  32  x  12  bit, 
normal  or  abnormal,  were  extracted  by  the  correspond¬ 
ing  methods  previously  described  [16,  17].  Both  ge¬ 
ometrical  pattern  and  relative  intensity  of  a  local  area 


on  a  radiographic  image  are  important  information  in 
a  radiographic  reading.  The  background  trend  of  each 
ROI  was  removed  to  eliminate  low  frequency  varia¬ 
tion  [16].  However,  the  background  structures  (i.e.,  ra¬ 
diographic  image  of  bone  on  chest  image,  vessels,  and 
large  soft  tissue  differences)  remained  in  each  ROI.  No 
normalization  procedure  was  taken,  because  normal¬ 
ization  can  mix  a  disease  pattern  with  a  non-disease 
pattern.  For  example,  (a)  small  nodules  and  end-on 
vessels  and  (b)  microcalcifications  and  film  defects  ba¬ 
sically  differ  only  in  contrast.  They  would  not  be  dis¬ 
tinguishable  if  the  feature  of  contrast  is  normalized  in 
the  pre-processing.  Since  many  disease  patterns  are  su¬ 
perimposed  on  background  structures,  we  have  not  ex¬ 
perienced  a  successful  unsupervised  training  technique 
with  our  database.  Three  supervised  training  methods, 
however,  achieved  some  success  and  are  discussed  in 
the  following  sections. 

2.2.  Disease  Pattern  Characteristics 

of  Microcalcifications  on  Mammograms 
and  Lung  Nodules  on  Chest  Radiographs 

In  general,  the  larger  the  nodule  the  higher  the  contrast 
of  the  nodule  profile  on  the  radiograph.  Small  rounded 
objects  possessing  high  contrast  are  most  likely  end- 
on  vessels.  In  addition,  the  size  of  end-on  vessels  is 
inversely  proportional  to  their  distance  from  the  cen¬ 
ter  of  the  heart.  This  is  because  anatomical  distribution 
of  larger  arteries  and  vessels  are  closer  to  the  heart. 
Clinical  instruction  indicates  that  faint  tails  of  the  ves¬ 
sel  turned  in  a  horizontal  direction  may  be  observable. 
A  rib  crossing,  which  sometimes  look  like  an  opaque 
round  object,  can  also  produce  a  false-positive  detec¬ 
tion.  See  Figure  3  for  examples  of  end-on  vessels,  rib 
crossings,  and  true  nodules. 

On  the  other  hand,  the  gray  value  differences  (i.e., 
contrast)  between  the  peak  of  microcalcifications  and 
local  background  tissue  are  somewhat  proportional  to 
the  size  of  the  calcifications  on  mammograms.  Film  de¬ 
fects,  caused  by  scratches  of  screen/film  system  or  cold 
spots  of  film  emulsion,  are  high  contrast  bright  spots. 
The  contrast  of  film  defects  is  independent  of  size. 
Several  image  blocks  shown  in  Figure  4  demonstrate 
the  difference  between  microcalcifications  and  film  de¬ 
fects.  All  image  blocks  were  randomly  selected  from 
our  database  and  processed  by  a  histogram  expansion 
for  display  purposes.  It  is  essential  to  use  a  sufficiently 
small  digitization  to  preserve  the  disease  pattern.  Po¬ 
tential  problems  of  using  a  large  digitization  spot  for 
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Figure  3.  The  upper  4  rows  show  64  nodule  blocks  sampled  from  the  database.  Each  image  block  on  rows  5  and  6  contain  no  nodule  but  a 
lung  or  no  structure.  Each  image  block  on  the  bottom  two  rows  contains  an  end-on  vessel. 


Figure  4.  Each  image  block,  extracted  from  mammogram,  on  the  upper  4  rows  contains  at  least  a  calcification.  Each  image  block  on  the 
bottom  4  rows  conlmns  M  least  a  local  maximum  value  of  gray  scale  (bright  spot)  which  is  not  a  calcification.  Each  block,  at  matrix  elements 
(1,4),  (5, 4),  (7, 4),  (9,  4),  and  (2, 6).contains  a  bright  spot  due  to  a  film  defect. 


acquiring  mamitiographic  images  are;  (a)  the  edge  of  a 
small  film  defect  can  be  blurred  and  (b)  very  small  mi¬ 
crocalcifications  are  not  actually  digitized.  These  prob¬ 
lems  are  less  pronounced  with  a  digitization  spot  size 
of  0.1  mm  which  was  the  specification  of  the  Lumysis 
laser  film  scanner  (Lumiscan  Model  150). 

Chest  images  were  digitized  and  reformatted  (shrunk 
by  using  pixel  averaging)  with  a  matrix  size  of 
512x625  X  12  bits  per  image  and  each  pixel  rep¬ 


resents  a  0.7  mm  x  0.7  mm  square  area.  Mammo¬ 
grams  were  digitized  with  a  computer  format  of 
2048  X  2500  x  12  bits  per  image  and  each  pixel  rep¬ 
resents  0.1  mm  X  0.1  mm  square  area.  The  suspected 
microcalcification  patches  shown  in  Figure  4  are  for 
display  purposes.  In  the  study  of  microcalcification  de¬ 
tection,  only  the  central  region  of  16  x  16  pixels  (i.e., 
1.6  mm  X  1.6  mm)  was  used  as  input  for  the  perfor¬ 
mance  evaluation  of  the  three  neural  network  systems. 
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3.  Comparative  Studies  Using  Neural  Networks 

3. 1.  Associated  Memory  Based  Pattern  Match 
Neural  Networks  for  Disease  Detection 

A  classifier  takes  a  feature  vector  and  produces  a  classi¬ 
fication.  The  core  portion  of  the  pattern  match  classifier 
searches  for  the  most  similar  pattern  in  the  memory. 

If  no  pattern  match  is  found  in  the  memory,  a  new 
pattern  is  created  and  stored  for  that  particular  clas¬ 
sification  in  the  memory  during  the  training.  Several 
neural  networks  belong  to  this  type  of  pattern  match, 

(a)  adaptive  resonance  theory  (ART)  and  its  extensions 
(i.e.,  ART-2  [21],  ARTMAP  [22],  etc.),  (b)  category 
learning  originated  by  Reilly  et  al.,  1992,  known  as 
RCE  method  [23],  and  (c)  Dynamic  ^able  Associate 
Learning  (DYSTAL)  [24-26]. 

We  used  the  processed  image  block  (i.e.,  patch)  as 
the  input  feature  vector.  Many  feature  vectors  of  this 
kind  may  be  “contaminated”  by  original  background 
structures,  which  are  difficult  to  discern  as  disease  pat¬ 
terns  or  background  patterns.  The  authors  are  aware 
that  it  is  important  to  extract  features  representing  vari¬ 
ous  aspects  of  disease  patterns  prior  to  the  classification 
task.  However,  our  goal  was  to  compare  which  method 
better  distinguishes  disease  patterns  from  non-disease 
image  patterns  using  the  image  patches  as  input  data. 
Since  DYSTAL  was  originally  designed  to  use  image 
data  as  input  for  a  classification  task,  it  was  selected  as 
one  of  the  methods  for  the  study. 

In  DYSTAL,  there  are  three  rules  for  aggregating  the 
input  feature  vector  and  propagating  the  signals: 

(a)  the  aggregation  rules  are  based  on  the  correlation 
between  the  input  feature  vectors  and  learned  pat¬ 
terns  (the  correlation  measures  the  similarity  be¬ 
tween  the  inputs  and  learned  patterns), 

(b)  the  propagation  rule  depends  on  the  maximum 
number  of  these  resulting  similarity  values,  and 

(c)  the  learning  rule  permits  the  system  to  maintain 
learning  patterns  as  needed. 

The  similarity  measure  is  defined  as  the  correlation 
of  a  learning  pattern  and  the  input  feature  vector  [26] 

Sj  =:CC{P\I) 

^  E,  (F/-f')x  ('<-'>  (2) 

V(i:.- w- ^4’ xE, 

where  Pj  is  the  value  of  the  ith  element  of  the  jih 
patch  vector,  P^  is  the  mean  value  of  the  elements  of 


the  patch  Pf  /,  is  the  value  of  the  ith  element  of  the 
input  feature  vector,  and  /  is  the  mean  value  of  the 
elements  of  the  inputs.  This  similarity  measure  uses 
the  cosine  of  the  angle  between  the  two  vectors  I  and 
P-'  in  the  n  dimensional  hyper-space,  where  n  is  also 
the  number  of  patch  elements. 

The  DYSTAL  also  uses  the  winner-take-all  approach 
of  propagating  maximum  simliarity.  If  the  maximum 
similarity  is  lower  than  a  pre-defined  value,  the  new 
feature  vector,  will  be  stored  as  a  newly  learned  pattern 
in  the  memory.  The  learned  pattern  is  then  assigned 
to  an  associated  class  which  is  either  a  true  or  a  false 
disease  case. 

3.2.  Convolution  Neural  Network  for  Disease 
Pattern  Recognition 

The  connection  between  nodes  in  the  conventional 
backpropagation  neural  network  (BPNN)  uniformly 
spreads  from  a  front  layer  to  a  back  layer  [27].  How¬ 
ever,  it  is  known  that  the  neighborhood  correlation 
is  usually  higher  than  that  of  the  long  distance  cor¬ 
relation  between  two  pixels  on  an  image.  It  is  con¬ 
ceivable  that  features  associated  with  nearby  pixels 
should  be  emphasized.  In  neural  network  terms,  the 
local  signal  interactions  rather  than  non-local  interac¬ 
tions  shall  be  established  to  instruct  the  neural  net¬ 
work  learning.  A  convolution  neural  network  (CNN), 
whose  nets  are  locally  formed,  is  selected  as  one  of  the 
classification  methods  in  the  experiment.  The  structure 
of  the  CNN  is  a  simplified  version  of  the  neocogni- 
tron  [28,  29].  We  used  only  a  2  hidden-layer  structure 
and  eliminated  all  the  complex-cell  layers.  Nets  be¬ 
tween  two  adjacent  layers  were  selectively  intercon¬ 
nected  across  groups.  We  modified  the  neocognitron 
network  structure  and  used  a  convolution  constrained 
backpropagation  method  for  the  training.  This  modi¬ 
fication  is  necessary  because  (a)  the  original  neocog¬ 
nitron  is  designed  for  a  binary  image,  (b)  the  original 
9  hidden-layer  structure  is  very  computationally  inten¬ 
sive  for  an  iterative  training  method  such  as  the  BPNN, 
(c)  a  one  or  two  hidden-layer  structure  is  considered 
adequate  for  relatively  simple  image  patterns  such  as 
lung  nodules  and  microcalcifications.  Figure  5  shows 
the  fundamental  structure  of  this  neural  network. 

In  the  CNN  signal  processing,  each  group  in  the 
receiving  layer  gets  signals  from  a  group  of  weights 
(e.g.,  kernels).  For  the  forward  signal  propagation,  the 
resultant  of  the  weighting  factors  of  the  kernel  convo- 
luting  the  element  values  of  the  front  layer  is  collected 
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Figure  5.  Artificial  convolution  neural  network  for  disease  pattern  recognition. 


onto  the  corresponding  matrix  elements  of  the  receiv¬ 
ing  layer.  This  operation  accounts  for  the  major  differ¬ 
ence  between  the  convolution  type  neural  network  and 
regular  fully  connected  neural  network.  In  the  lung  nod¬ 
ule  study,  we  used  an  image  patch  size  of  32  x  32  (i.e., 
21.4  mm  x  21.4  mm)  with  a  convolution  kernel  size  of 
7  X  7.  In  the  study  of  microcalcification  detection,  the 
central  region  of  16  x  16  pixels  (i.e.,  1.6  mm  x  1.6  mm) 
of  the  original  image  patch  size  of  32  x  32  with  a  con¬ 
volution  kernel  of  5  x  5  was  used.  The  choices  of  using 
7x7  and  5x5  convolution  kernels  were  based  on  ex¬ 
tensive  studies  [30,  31]  in  lung  nodule  and  microcalci¬ 
fication  cases,  respectively.  One  reason  for  using  much 
smaller  size  kernel  is  that  microcalcifications  are  very 
tiny  compared  to  observable  lung  nodules.  In  addition, 
small  kernels  are  appropriate  for  small  objects  for  eval¬ 
uating  the  difference  between  true  and  false  microcal¬ 
cifications.  Each  hidden  layer  consists  of  10  groups. 
The  output  layer  has  10  nodes  (2  categories)  which 
were  fully  connected  to  the  second  hidden  layer. 

3J.  Training  of  Neural  Networks 

3,3.1,  Classification  Invariance  of  Matrix  Operations, 
In  general,  medical  image  patterns  possess  either  a 


circular  symmetric  shape  (e.g.,  nodules)  or  appear  as 
small  objects  with  a  variety  of  geometric  patterns  (e.g., 
calcifications).  In  such  cases,  image  pattern  recogni¬ 
tion  does  not  call  on  top-down  or  left-right  geome¬ 
try  as  classification  criteria.  Therefore,  we  can  take 
advantage  of  this  characteristic  as  an  invariance.  In 
other  words,  we  can  rotate  and/or  shift  the  input  vector 
two-dimensionally  and  maintain  the  same  output  as¬ 
signments  for  the  training.  This  method  may  have  two 
effects  on  the  neural  network:  (i)  to  instruct  the  neural 
network  that  the  rotation  and  shift  of  the  input  vector 
would  receive  the  same  classification  result;  and  (ii)  to 
increase  the  total  number  of  training  samples  which  is 
expected  to  enhance  the  performance  of  the  neural  net¬ 
work.  We  only  rotated  each  suspected  image  block  8 
times  for  input  to  test  our  hypothesis.  Four  of  the  ro¬ 
tations  are:  0°,  90°,  180°,  270°.  In  addition,  we  also 
flipped  over  (left-right)  the  original  image  matrix  and 
used  the  same  rotations  again  to  obtain  4  additional 
rotations. 

3,3,2,  Modification  of  Backpropagation  Training  for 
the  CNN,  As  indicated  in  Section  2.2,  a  high  signal 
of  a  feature  can  result  from  a  negative  object  such  as 
higher  contrasts  in  end-on  vessels  than  those  in  nodules 
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Figure  6.  Fuzzy  output  association  is  constructed  by  a  Gaussian  and  a  trainer  imposed  repulsive  function.  Note  that  only  one  .curve  is  used  for 
a  training  case  associated  with  an  output  target  node  (e.g.,  node  7  represents  the  activated  fuzzy  function). 


and  higher  peak  values  in  film  defects  than  those  of  mi¬ 
crocalcifications.  Therefore,  we  used  a  Gaussian-like 
activation  function  for  the  cumulated  signal  propaga¬ 
tion  between  input  layer  and  the  first  hidden  layer.  The 
purpose  of  this  activation  function  is  to  treat  both  low 
and  high  cumulated  signals  as  false  features  that  would 
eventually  facilitate  the  classification  process  in  the  fol¬ 
lowing  layers.  This  Gaussian-like  activation  function 
would  also  be  appropriate  for  the  BPNN  using  an  im¬ 
age  block  as  the  vector  described  in  Section  3.5.  In 
the  conventional  BPNN,  fully  connected  rather  than 
locally  connected  networks  were  implemented. 

We  used  the  sigmoid  activation  function  for  the  for¬ 
ward  signal  propagation  for  all  layers  other  than  the 
first  hidden  layer  and  applied  backpropagation  training 
for  the  adjustment  of  weights  between  any  two  adja¬ 
cent  layers.  The  main  difference  between  conventional 
weights  and  kernel  weights  is  that  the  former  are  inde¬ 
pendent  and  the  latter  are  constrained  by  grouping.  By 
looking  at  the  CNN  processing,  one  may  find  that  sig¬ 
nals  are  filtered  and  modulated  as  in  a  circuit  system. 
Signal  propagation  from  one  layer  to  the  next  is  com¬ 
posed  of;  (a)  an  adaptive  convolution  combiner  and 
(b)  activation  functions  (Gaussian-like — ^Eq.  (3) — and 
sigmoid— Eq.  (4)  functions  for  the  first  hidden  layer 
and  for  other  layers,  respectively.  See  Fig.  5)  which 
are  given  below: 
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where  Sx((i,  j)',  n)  represents  the  signal  at  node  (i,  j), 
nth  group,  and  x  layer.  kx(,(u,  u);  n)  denotes  a  weight¬ 


ing  factor  value  at  net  (« ,  v) ,  nth  group,  and  connecting 
from  x  -  1  to  X  layer,  m-^n  represents  those  in  group 
m  that  connect  to  group  n. 

3.3.3.  Backpropagation  Neural  Network  Trained  by 
Radiologists.  We  modeled  radiologists’  diagnostic 
rating  (i.e.,  the  probability  of  a  disease  existing  in  a 
suspected  area)  and  incorporated  it  into  the  neural  net¬ 
work  training.  In  fact,  when  a  radiologist  determines  a 
specific  probability  of  a  disease  pattern  in  an  image  area 
based  on  his/her  training  and  experience,  this  probabil¬ 
ity  would  be  accompanied  with  a  variation  (or  a  stan¬ 
dard  deviation).  An  asymmetric  output  association  dis¬ 
tribution  is  shown  in  Figure  6.  The  use  of  asymmetric 
fuzzy  assignment  attempted  to  direct  non-disease  cases 
toward  low  value  nodes  and  to  push  disease  cases  to¬ 
ward  high  value  nodes.  With  this  fuzzy  assignment  for 
the  output  nodes  in  the  training,  the  relation  between 
adjacent  nodes  was  established.  This  supervised  train¬ 
ing  can  be  generally  applied  to  any  situation  where  an 
association  of  outputs  is  necessary. 


3.4.  Classification  of  Output  Values  in  the  Testing 


Corresponding  to  the  grading  system  arranged  in  the 
training,  a  polarized  (linearly  weighted)  function  is 
given  as  an  indication.  In  practice,  we  can  define  a  nor¬ 
malized  disease  detection  index  (NDDI)  for  the  judg¬ 
ment  of  a  suspected  area: 


\0„  X  (w-WQ-f  I)] 


.  Sn  €  true  nodes  I 


<5) 


where  n  denotes  the  node  in  the  output  layer,  no  is  the 
node  number  of  the  least  likely  true  node,  0„  is  the 
output  value  at  node  n,  and  N  is  the  total  number  of 
output  nodes.  Hence,  a  nodule  detection  index  of  0 
indicates  a  definite  non-nodule  and  a  nodule  detec¬ 
tion  index  of  1  or  greater  implies  a  definite  nodule 
case  determined  by  the  neural  network.  The  calculated 
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Figure  7.  Artificial  backpropagation  neural  network  with  fully  connected  nodes  for  disease  panem  recognition. 


NDDIs  were  evaluated  by  the  receiver  operating  char¬ 
acteristic  (ROC)  analysis  to  measure  the  performance 
of  the  neural  network.  In  general,  A^,  represent¬ 
ing  the  area  underneath  the  ROC  curve,  is  an  index 
which  signifies  the  performance  of  a  system.  The  ROC 
curve  is  formed  with  the  true-positive  rates  versus 
the  false-positive  rates  of  a  system.  We  also  used  a 
performance  measure — relative  detection  accuracy — 
converted  from  the  curve  to  compare  the  results  of  the 
neural  network  systems. 


compare  the  effectiveness  of  the  neural  network  archi¬ 
tecture  designs.  The  structure  of  the  BPNN  with  one 
hidden  layer  is  shown  in  Figure  7.  We  tried  several 
arrangements  for  the  number  of  nodes  used  in  the  hid¬ 
den  layer.  Our  experiment  indicated  that  approximately 
200  nodes  and  450  nodes  in  the  hidden  layer  would  be 
appropriate  for  nodule  and  microcalcifcation  studies, 
respectively.  These  numbers  may  be  altered  according 
to  the  size  of  the  image  block  used. 


3.5.  Backpropagation  Neural  Network  Technique 
for  Disease  Pattern  Recognition 

We  have  also  investigated  the  performance  of  the  con¬ 
ventional  backpropagation  (BP)  neural  network  with 
(BP/IH)  and  without  (BP/OH)  a  hidden  layer.  In  other 
words,  the  background  reduced  image  pixel  values 
were  used  as  input  signals  for  the  input  layer.  We 
expected  that  the  hidden  layer  would  serve  as  a  fea¬ 
ture  extractor.  The  same  training  and  testing  data  sets, 
which  again  were  “contaminated”  (i.e.,  very  strong 
background  sructures  such  as  rib  ovelapping  a  nodule) 
and  used  in  the  pattern  match  neural  network,  were 
entered  into  the  BP  neural  network.  Basically,  we  set 
up  the  experiment  as  described  in  Section  3.2  with  one 
exception:  the  fully  connected  neural  nets  are  used  to 


4.  Results 

4. 1 .  Detection  of  Clustered  Microcalcifications 

After  the  pre-scan  process  by  the  computer  program,  38 
digital  mammograms  provided  220  true  and  1 132  false 
subtle  microcalcifications.  For  the  neural  network  stud¬ 
ies,  we  divided  the  mammograms  into  two  sets:  (A„) 
19  images  (containing  108  true  and  583  false  image 
blocks)  and  (B„)  another  set  of  19  images  (contain¬ 
ing  1 12  true  and  549  false  image  blocks).  We  did  not 
ask  radiologists  to  rate  image  blocks  in  the  training  set. 
Therefore,  only  2  output  nodes  with  8  rotated  input 
patches  were  used.  Neither  output  association  nor  a 
trainer  imposed  function  was  employed.  We  found  that 
the  use  of  a  small  image  block  of  16  x  16  resulted  in 
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Table  J.  Performance  of  neural  networks  in  the  detection  of  clus¬ 
tered  microcalcifications  using  group  A„  as  training  set  and  group 
B„  as  testing  set. _ 


- 

Neural  networks 

DYSTAL 

BP/OH 

BP/IH 

CNN 

(Area  under  the  ROC  curve) 

0.78 

0.75 

0.86 

0.97 

Detection  accuracy 

{%  true-positive  detection) 

70 

70 

75 

90 

(#  false-positive  per  image) 

4.3 

4.5 

3.5 

0.5 

Table  2.  Performance  of  neural  networks  in  the  detection  of  clus¬ 
tered  microcalcifications  using  group  as  training  set  and  group 
Am  as  testing  set,  _ _ 


Neural  networks 

DYSTAL 

BP/OH 

BP/IH 

CNN 

Az  (Area  under  the  ROC  curve) 

0.76 

0.77 

0.84 

0.97 

Detection  accuracy 

{%  true-positive  detection) 

70 

70 

75 

90 

(#  false-positive  per  image) 

4.3 

4.2 

3.7 

0.5 

the  best  performance  in  the  detection  of  single  micro¬ 
calcification  [31]. 

Tables  1  and  2  show  the  performance  resulting  from 
the  three  neural  network  systems.  DYSTAL  and  BP/OH, 
acting  as  classifiers,  receive  the  lowest  performance. 
The  best  performance  index  (A^)  was  0.90  when  the 
determination  was  based  on  individual  microcalcifi¬ 
cations  and  was  improved  to  0.97  when  the  determi¬ 
nation  was  based  on  the  clustered  microcalcifications 
using  CNN.  In  the  latter  evaluation,  suspected  clus¬ 
ters  including  1  or  2  spots  within  a  1  cm  area  were 


rejected  and  the  average  NDDI  taken  from  the  clus¬ 
tered  spots  was  used  for  the  ROC  evaluation.  This  is 
because  the  detection  of  clustered  microcalcifications 
is  more  clinically  significant  than  individual  calcifi¬ 
cations,  since  the  clustered  microcalcifications  (3  ■  or 
more)  are  a  strong  indication  of  breast  carcinoma  in  ra¬ 
diological  diagnosis.  The  comparative  study  was  based 
on  detection  strategies:  (i)  to  first  detect  suspected 
individual  microcalcifications  and  (ii)  then  to  cluster 
them  as  group  when  possible;  otherwise  rejected  the 
detection. 

4.2.  Detection  of  Lung  Nodules 

The  first  group  (Ai)  of  image  blocks  were  extracted 
from  3 1  chest  radiographs  containing  multiple  nodules. 
A  senior  radiologist  selected  91  true  nodules  and  247 
non-nodules  areas.  The  second  group  (Bi)  was  col¬ 
lected  from  31  images  containing  95  nodules  and  258 
non-nodules  and  was  confirmed  by  biopsy  or  by  follow¬ 
up  showing  growth  of  the  nodule.  The  pre-scan  process 
was  performed  first  to  locate  the  center  of  the  high  in¬ 
tensity  island  and  isolate  the  image  block  for  training. 
For  the  training,  each  original  and  its  7  “brother”  image 
blocks  shared  the  same  score  vector  (probability  of  a 
disease  and  output  association)  pre-determined  by  the 
radiologist.  During  the  training,  the  original  and  its  7 
“brother”  image  blocks  were  entered  as  a  group  in  the 
same  sequence.  Tables  3  and  4  show  the  performance 
of  using  different  neural  network  techniques  and  cor¬ 
responding  enhancement  methods  (i.e.,  fuzzy  output 
training). 


Table  3.  Performance  of  neural  networks  in  the  detection  of  lung  nodules  using  group  A  j  as 
training  set  and  group  B\  as  testing  set. _ _ 


Neural  networks 

DYSTAL 

BP/OH 

BP/IH 

CNN 

CNN/FUZZY 

Az  (Area  under  the  ROC  curve) 

Detection  accuracy 

0.56 

0.58 

0.68 

0.82 

0.89 

{%  true-positive  detection) 

60 

60 

70 

80 

80 

(#  false-positive  per  image) 

7 

6.6 

5 

4 

2.5 

Table  4.  Performance  of  neural  networks  in  the  detection  of  lung  nodules  using  group  B\  as 
training  set  and  group  A  i  as  testing  set. 

Neural  networks 

DYSTAL 

BP/OH 

BP/IH 

CNN 

CNN/FUZZY 

Az  (Area  under  the  ROC  curve) 

Detection  accuracy 

0.57 

0.61 

0.70 

0.83 

0.88 

(%  true-positive  detection) 

60 

65 

70 

80 

80 

(#  false-positive  per  image) 

7 

6.5 

4.8 

4 

2.5 

Ill  Loetal 


These  comparison  studies  of  both  diseases  imply 
that  pattern  classifiers  such  as  DYSTAL  and  BP/OH 
cannot  function  alone  to  analyze  image  blocks 
(patches)  with  substantial  background  structures.  Once 
the  feature  extraction  procedure  was  added,  the  per¬ 
formance  of  the  neural  network  increased  as  evident 
in  the  results  of  BP/IH,  CNN,  and  CNN/FUZZY  in 
Tables  1—4.  We  also  learned  that  the  convolution  for 
two-dimensional  feature  extraction  and  fuzzy  training 
guided  by  radiologists’  determination  were  success¬ 
ful  methods  to  improve  the  disease  detection.  With  the 
neural  network  used  in  these  studies,  we  could  not  iso¬ 
late  which  procedure,  the  feature  extraction  or  the  final 
classification,  was  improved  by  the  CNN  training. 

5.  Discussion  and  Conclusions 

Medical  image  pattern  recognition  using  extracted  fea¬ 
tures  for  input  has  been  proposed  in  the  detection  of 
disease  patterns  [20].  Since  only  a  small  number  of 
inputs  are  used,  the  computational  time  can  be  much 
less  than  that  of  the  CNN  for  the  training.  As  long  as 
the  features  of  a  disease  pattern  are  well-defined  and 
can  be  quantified  as  values  or  vectors,  many  neural  net¬ 
work  techniques  should  be  able  to  classify  them.  On  the 
other  hand,  the  CNN  can  internally  extract  features  of 
disease  patterns  and  is  capable  of  distinguishing  non¬ 
disease  from  disease  patterns.  A  potential  advantage  of 
using  the  CNN  is  that  once  trained  kernels  are  analyzed, 
feature  extraction  can  be  specifically  defined  not  only 
by  the  users’  experience  but  also  by  the  confirmation 
of  the  CNN. 

In  this  study,  we  have  utilized  the  CNN  in  conjunc¬ 
tion  with  several  effective  training  methods:  (i)  pro¬ 
viding  a  radiologists’  rating  scale  for  the  training  of 
neural  nets,  (ii)  introducing  the  neural  network  with 
the  classification  invariance  of  input  matrix  operations, 
(iii)  using  output  association  functions  to  fuzzify  the  ra¬ 
diologists’  determination  and  to  establish  the  relation¬ 
ship  between  adjacent  output  nodes,  and  (iv)  rendering 
trainer  imposed  functions  to  enhance  the  performance 
of  the  neural  network.  We  found  that  the  performance 
of  the  CNN  in  detecting  both  diseases  improved  sig¬ 
nificantly  by  administering  these  training  methods. 

Considering  the  convolution  operation  as  a  feature 
extraction  processing  from  the  input  layer  to  the  hidden 
layer  in  the  CNN,  we  found  that  feature  extraction  is  an 
important  procedure  to  assist  the  classifier  (e.g.,  con¬ 
ventional  BP)  in  performing  the  recognition  task.  Pat¬ 
tern  classifiers,  including  those  newly  developed  neural 


networks,  would  not  be  able  to  distinguish  "highly 
contaminated”  feature  vectors.  Approximately  40%  of 
lung  nodules  are  superimposed  on  a  posterior  rib  with 
various  of  orientations.  However,  less  than  10%  of  mi¬ 
crocalcifications  in  our  database  are  obstructed  by  other 
abrupt  breast  tissues.  In  addition,  the  probability  of 
having  end-on  vessels,  which  resemble  nodules,  on  a 
chest  radiograph  is  higher  than  that  of  film  defects, 
which  resemble  microcalcifications.  In  other  words, 
chest  radiographs  contain  much  more  background 
structures  than  mammograms  do.  These  background 
structures  will  contaminate  the  feature  vector  and  lead 
to  degradation  of  the  machine  observers’  performance. 
Clinical  studies  also  indicated  that  highly  experienced 
human  observers  can  detect  only  68%  of  lung  nod¬ 
ules  [1]  and  95%  of  clustered  microcalcifications  [5]. 
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Abstract  -  A  generalized  decomposition  method  (H+PC) 
based  on  Haar  transform  has  been  derived.  This  general  form 
can  exactly  describe  dyadic  transforms.  Another  general  form 
(B+PC),  which  is  a  subset  of  the  doublet  system,  based  on  the 
binomial  filter  can  describe  triplet-type  decompositions 
including  whole  point  symmetric  biorthogonal 
transformations.  Both  systems  can  be  uniHed  by  the  delta 
function  basis  decomposition  system  (D+PC).  In  this  paper,  we 
found  that  these  three  bases  and  their  expansions  using 
predictive  approximation  form  the  dyadic  decomposition 
family.  Wavelet  and  integer  wavelet  based  decomposition 
methods  can  also  be  included  in  this  unified  framework.  This 
framework  clearly  bridges  the  relationship  among  various 
types  of  dyadic  transforms.  We  show  that  many  dyadic 
decompositions  can  be  directly  “computed  from  their  basis.  A 
model  of  adaptive  decomposition  is  also  presented.  Theoretical 
development  and  detailed  computational  methods  are  given. 
The  property  of  low  entropy  in  the  decomposed  data  sequence 
is  used  as  a  major  criterion  for  comparing  various  dyadic 
transforms.  For  the  readers^  convenience,  the  nomenclature  of 
the  symbols  used  in  this  paper  is  appended. 

Index  Terms  -  Dyadic  Decomposition,  Data  Compression,  Haar 
Transform,  and  Binomial  Transform. 

1.  Introduction 

In  the  past  two  decades,  the  applications  of  sub-band  and 
wavelet  decomposition  methods  for  data  compression  have 
been  extensively  discussed  in  the  literature  [1,2].  The 
development  of  coding  methods  based  on  spatial-temporal 
correlation  for  multi-resolution  decomposition  pyramid  has 
made  those  compactly  supported  transforms  effective  for 
image  data  compression  [3,4].  Recently,  we  found  that 
significantly  different  compression  efficiencies  were 
obtained  while  decomposing  an  image  pattern  with  different 
orthogonal  wavelets  [5].  We  believe  that  it  is  essential  to 
investigate  the  relationship  among  dyadic  decomposition 
methods  prior  to  further  analysis  of  the  data  characteristics 
using  different  decomposition  methods.  Through  this  study, 
we  found  that  a  predictive  approximation  [6,7]  can  be  added 
onto  basic  decomposition  methods  such  as  Haar  and 
binomial  transforms  and  form  generalized  transformation 
systems.  For  each  pair  of  mother  scale  and  wavelet 
functions,  there  exists  a  set  of  weights  to  convert  the 
transform  results  obtained  from  the  base  transformation  into 
a  wavelet  transformation.  With  constraints  imposed  on  the 
filter  bank,  the  generalized  system  is  perfectly 
reconstructable  (PR)  and  is  capable  of  producing  low 
entropy.  Both  characteristics  are  basic  requirements  of  the 
decomposition  in  a  meaningful  data  compression  scheme. 
Due  to  the  broad  area  of  signal  decomposition,  we  chose  to 
focus  on  discrete  dyadic  decompositions  in  this  paper. 

IL  Haar  Transform  and  Its  Variants 

The  discrete  Haar  transform,  which  is  formed  by  a  doublet 
pair  (1,1)  and  (1,-1),  is  one  of  the  simplest  and  reversible 
transforms.  For  a  given  data  sequence  X: 


(x^,  i  =  0,..N  -1),  the  discrete  Haar  transformation  splits 
the  data  sequence  into  two  sequences  by  down-sampling: 

In  =  (^2n.l  +  )/a/2  -  "  =  0,l,...,(N/2)-l  ...(D 

and  t„  =  (x2„+i  -  X2„  )l42  ,  n  =  0,l,...,(N/2)-l.  ...(2) 

The  reconstruction  of  the  pair  elements  (^„+i,JC„)  possesses 
identical  forms  of  the  above  two  processes  through  up- 
sampling. 

The  binomial  transform  is  another  basic  transform  which 
is  composed  of  two  Haar  bases  and  form  a  triplet  pair  (1,2,1) 
(i.e.,  (U)®(1,1))  for  low-pass  and  (1,-2,1) 

(i.e.,(l-l)®(l-l))  for  high-pass.  Using  the  dyadic 

decomposition  format,  we  define  these  two  pairs  of  bases 
(four  basic  components)  as: 

Lh=x®(U)  4-2  //«=x®a-i)i2 

Lb=X®  (1,2,1)  ^2  Hb=X®  (-1,2,-1)  I2 

In  the  following  sections,  we  will  show  that  dyadic 
decomposition  methods  can  be  extended  from  these  two 
pairs  of  bases  through  various  combinations.  Strictly 

speaking,  the  Haar  transform  is  the  most  fundamental  form 
since  the  triplet  pair  can  be  obtained  by  convolving  two  Haar 
pairs. 

III.  High-Pass  Decomposition  Based  on  Haar  Transform 
(H+P  and  S+P  Transforms) 

For  an  integer  data  sequence,  discrete  Haar  transform  can 
be  approximated  by  Sequential  (S)  transform  [8].  Basically, 
S  transform  computes  (a)  the  average  of  the  two  adjacent 
elements  of  the  integer  data  sequence  as  the  low-pass 
component  and  (b)  the  difference  as  the  high-pass 
component.  More  specifically,  the  former  is  the  truncated 
integer  of  the  average  value.  Hence  Eqs.  (1)  and  (2)  can  be 
rewritten  as: 

=L(-^2/i  +-«2n+l)/2j  -(3) 

and  =X2„+i -X2„,  ...(4) 

where  [•  J  stands  for  a  truncation  operation  that  turns  a  real 
number  into  an  integer.  The  corresponding  inverse 
operations  are: 

^2„+i=*„+L(^«+l)/2j  -(5) 

and  ^2n  ~  ■^2n+l  ~^n  ■  •••(6) 

III.A.  Prediction  in  the  high-pass  component  through  Haar 
transform 

Said  and  Pearlman  added  a  predictive  term,  e„  ,  onto  Eq. 
(4)  attempting  to  further  reduce  the  first-order  entropy  in  the 
high-pass  process  [6,  7].  In  the  context  of  reversible 
operation,  the  prediction  in  the  high-pass  component  can  be 
further  generalized  ( ). 

I={x2n.i  -X2„)  +  LGe„  +l/2j=d„  +[Ge„+in\  ...(7) 
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Eqs.  (3)  and  (7)  create  a  general  form  for  S+P  (i.e.,  S  plus 
prediction)  transforms.  Eq.  (8)  is  the  non-truncation  version 
of  Eq.  (7)  for  H+P  (i.e.,  Haar  plus  prediction)  transforms: 

=  (’^2n+l  “  )+  ^ 

where  is  the  same  as  .  Its  corresponding  low-pass 
counterpart  is  l„  which  is  the  non-truncation  value  of  , 
The  estimation  term,  ,  which  is  a  component  in  the  new 
difference  value,  ,  can  be  expressed  by  the  decomposed 
values  of  its  neighboring  elements  using  polynomial  terms 

N/2-l-n  ~  -1  ^ 

Ge„='Z(  S  1  rPjitn.jY) 

r  i=-n  j-~n 

Eq.  (9)  does  not  guarantee  that  an  arbitrary  set  of  and 
can  produce  low  first-order  entropy.  In  practice,  only 
certain  sets  of  and  ^P^  can  produce  low  global 
entropy.  This  equation  indicates  that  the  predictive  value  of 
can  be  composed  by  low-pass  and  high-pass  components 
of  discrete  Haar  transform.  Usually,  it  only  takes  a  few 
neighboring  elements  to  compute  the  predictive  value. 
Typically,  the  corresponding  and  ,P^  are  small 

weights  in  order  to  produce  a  low  entropy.  In  [6],  Said  and 
Pearlman  suggested  use  the  linear  terms  only  (i.e.,  =0 

and  ^P^  =  0  for  r  9^  1): 

V/2-1-/I  -1 

Ge„=e„=  2  -(10) 

i-~n  ;=-n 

They  also  gave  several  examples  for  S+P  transform  [6,7], 
which  empirically  produce  low  entropy,  as  shown  in  Table  I. 

Table  L  Examples  of  Contribution  Weights  Suggested  by 
Said  and  Pearlman. 


Examples  of 
Prediction 

Contribution  terms  | 

CCj 

0-1 

a-2 

P-i 

A 

0 

1/4 

0 

-1/4 

0 

B 

0 

1/4 

-1/8 

-3/8 

1/4 

C 

•1/16 

5/16 

1/4 

-1/2 

3/8 

Remark:  The  contribution  terms  given  by  [6]  are  scaled 
differently  from  Eq.  (10).  Po=l  for  all  S+P  cases. 


Providing  the  property  of  perfect  reconstruction,  Eq.  (8; 
is  the  high-pass  wing  of  a  generalized  dyadic  transform.  It! 
reconstruction  process  can  be  obtained  by  reversing  ihi 
computing  order  in  the  inverse  H+P  transform: 

~  ^  f  N/2-\-n  ^  -1 

r  i=-n  J=-n 

...(11 

and  its  counterpart  in  the  inverse  S+P  transform  is: 
d„=d„-lGe„+l/2j  = 


d„- 


Ar/2-l-«  -1 

S(  s  rPM,,jy)^\i2 

r  i--n  j=-n 


...(12) 


The  average  values,  ,  are  always  available  during  the 

reconstruction  of  7„  .  However,  only  those  ,  where  j  =  - 

/I,  -«+/,  ...  0,  are  available  while  computing  reconstruction 
values  from  low  to  high  indices.  The  inverse  S+P  transform 
shares  the  same  context. 

For  data  compression,  particularly  for  lossless 
compression,  a  minimum  requirement  for  data 
decomposition  is  that  decomposition  operation  must  be 
reversible.  Since  Haar  and  S  bases  are  reversible,  Eqs.  (8) 
and  (9)  provide  a  new  dimension  for  generalization  of  the 
system.  In  the  following  section,  we  show  this 
generalization  approach  and  attempt  to  link  it  with  the  two- 
band  and  orthogonal  wavelet  decompositions.  Furthermore, 
Eq.  (8)  as  a  generalized  high-pass  form  of  Haar  based  dyadic 
decompositions  implies  that  implementation  of  switching 
different  H+P  transforms  can  be  performed.  The  use  of 
different  sets  of  S+P  transforms  on  the  same  data  sequence 
is  also  allowed  with  Eq.  (7).  This  is  because  we  can  convert 
decomposed  values  between  two  transformation  systems 
using  two  sets  of  a,  and  pj  while  operating  different 

transforms  on  different  characteristics  of  data  segments.  We 
will  further  discuss  this  application  in  Section  VII. 

III.B.  High-pass  decomposition  in  S+P  and  H+P  transforms 

High-pass  coefficients  in  a  generalized  system  are 
obtained  by  adding  a  predictive  term  in  Haar  or  S  transform. 
Even  beginning  with  an  integer  data  sequence,  the  high-pass 
coefficients  of  S+P  are  different  from  those  obtained  from 
H+P.  Nevertheless,  the  decomposed  data  are  highly 
correlated.  The  decomposed  data  in  the  two  low-pass 
domains  are  slightly  different  in  the  truncated  lowest  bit. 
They  also  differ  in  the  scaling  factor  of  l/^(i.e., 

In  high-pass  domains,  however,  the 
scaling  factor  is  the  main  difference  between  the  two 
systems  (i.e.,  d„=t„=  ■J2t„ ).  It  is  seen  that  the 
decomposed  coefficients  in  the  low-pass  and  high-pass 
domains  are  scaled  independently.  The  use  of  different 
scaling  factors  is  mainly  due  to  the  requirement  of  integer 
computing  in  S  family  systems. 

In  the  following  subsection,  we  use  predictive  terms  (i.e., 
Eqs.  (7)  and  (8))  to  expand  high-pass  processes  as  a  part  of 
generalization  in  dyadic  decomposition.  The  low-pass 
processes  remain  unchanged  so  that  the  computation  in  H+P 
and  its  approximation  in  S+P  follow  Eqs.  (1)  and  (3), 
respectively.  The  reconstruction  process  can  be  arranged  by 
inverse  sequential  operation  or  designing  an  orthogonal 
based  low-pass  to  satisfy  the  condition  Y,hji^_2i  =  J/^for 

V/  [9]. 

ni.C.  Reformulating  m-tao  orthogonal  filter  coefficients. 
(  h^^ ,  in  wavelet  transform 

Since  and  are  two  decomposed  components  of 
Haar  transform,  Eqs.  (1)  and  (8)  representing  the  two  wings 
of  H+P  transform  can  be  computed  through  scaled  Haar 

bases  ( and  ),  we  can  employ  scaled  Haar  bases  to 
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develop  S+P  and  H+P  transform  systems.  As  an  example, 
the  high-pass  process  of  an  m-tap  orthogonal  wavelet 
filtering  can  be  made  equivalent  to  that  of  an  H+P  transform 
(i.e.,  Eq.  (8)).  Providing  an  m-tap  mother  scale  function 

a  unique  set  of  and  ySy  can  be  found  to 

match  the  wavelet  function  (g^,  v=0J,..,m-l  for 

).  By  omitting  the  convolution 

operations  with  the  data  vector  x  for  the  both  sides  of  the 
equation,  we  let 


to  match  the  case  [9].  There  are  two  physical  meanings 
associated  with  this  situation:  (a)  the  sum  of  contribution 
weights  is  0  and  (b)  the  contribution  value  by  the  neighbors 
can  be  reformatted  by  difference  values  of  the  average 
values.  It  is  seen  that  the  prediction  can  be  made  not  only 
by  the  difference  values  of  the  adjacent  elements  but  also 

from  the  difference  values  of  the  average  values  (i.e.,  or 
b^).  This  certainly  is  an  effective  strategy  for  making  a 
good  prediction  through  a  decomposition  method. 


...(13) 


+(...  1,  i,  0,  0)x«_, 

+  (...  0,  0,  i,  i)xao 

+  ( . 

+  (...-1,  1,  0,  0)xjS_i 

+  (...  0,  0,-1,  l)xPo 

=  (...  —  ,/l2 9  > ^0 ^ 

where  C  is  the  offset  scaling  factor  mentioned  earlier.  Based 
on  Eqs.  (8)  and  (10),  b„={...  0,  0,  i,  |)®X  . 

Vi=(-  I-  Y’  0-  0)®xJn=(-  0.  1)®^’ 

1,  1,  0,  0)®X,  and  so  on;  where 
0  represents  the  convolution.  In  addition,  should  be 
unity.  The  rest  of  the  contribution  weights  (i.e.,  or,  and  Pj  ) 
as  well  as  C  can  be  solved  in  terms  of  the  filter  coefficients 
(i.e.,  )  of  an  m-tap  transformation.  Specifically: 

C  =  2/(/io +^1)9  oCq  =2(/zo-^i)/(^o+^i)  A) 
a.  =  lih^i  -  )  /(/zq  +  )  and 

Ay  =(^2;+^2;+l)/(^0+'^l)  -(14) 

This  solution  indicates  that  Eq.  (8)  is  the  high-pass  wing  of 
the  generalized  decomposition  form  that  covers  high-pass  of 
the  orthogonal  wavelet  transforms.  The  computed  values 
using  Eq.  (8)  with  weights  given  by  Eq.  (14)  are  exactly  the 
same  as  the  high-pass  coefficients  through  wavelet  transform 
multiplied  by  the  constant  C.  Depending  upon  the  low-pass 
component,  the  inverse  transformation  can  be  made  through 
inverse  sequential  process  or  its  corresponding  wavelet-based 
synthetic  process.  The  synthetic  filter  coefficients  are  usually 
rearranged  through  the  decomposition  filter  coefficients. 
Without  a  corresponding  decomposition  in  the  low-pass 
wing,  the  use  of  inverse  sequential  process  is  necessary  for 
perfect  reconstruction  in  S+P  and  H+P  systems.  In  fact, 
inverse  sequential  operation  is  the  only  way  for  an  S+P 
system  to  be  perfectly  reconstructable.  A  similar  algorithm 
development  can  be  derived  to  associate  doublet-type  two- 
band  filter  with  H+P  systems. 

From  Eq.  (14),  we  can  easily  find  that 

property  of  zero-mean  of 

j  « 

high-pass  filtering  is  maintained  (i.e.,  =0) 

u 

orthogonal  wavelet  or  a  two-band  system,  must  vanish 


IV.  High-Pass  in  Binomial  Based  Decomposition  (B+P 
and  Sb+P  Transforms) 

Instead  of  operating  two  adjacent  elements  of  the  data 
sequence  in  Haar  transform,  binomial  family  systems 
possessing  symmetric  filters  operate  odd  number  of  adjacent 
elements  for  each  set  of  convolution  computation.  The 
binomial  basis  is  formed  by  a  triplet  pair:  (1,2,1)  and  (-1,2,- 
1)  mentioned  earlier  (Section  II).  In  this  paper,  the  binomial 
filter  based  transform  using  S  transform  operations  is  called 
Sb  transform.  The  transform  operations,  corresponding  to 
Eqs.  (3)  and  (4),  for  the  average  and  difference  values  are 
given  below: 

K  =  [{X2n-l  +  +  Xjn+l  V  4 J  = 

L(^2„  +L(^2„.i  +  X2„^,)/2j)/2j,n=l,2,...,(N/2)-l,  ...(15) 
d'„  =  2x2n  -  X2n-1  “  X2„+i  n  =  l,2,...,(N/2)-l.  ...(16) 

9 

In  fact,  d'„  is  a  composed  difference  value.  In  practice,  Bq 
follows  Eq.  (3),  and  d^  follows  Eq.  (4)  multiplied  by  2.  In 

this  paper,  we  use  and  for  the  corresponding 
decomposed  low-pass  and  high-pass  values  of  binomial 
transform  and  use  and  for  the  non-truncation  values 
of  and  ,  respectively.  The  inverse  operations  of  the 
Sb  transform  are  given  below: 

^2„=^«+L(^«+3)/4j  -(17) 

and  X2„^i  =  2x2„  -d'„-  JC2„-i  •  -(18) 

where  ^2n-i  is  obtained  from  the  last  computation  (i.e., 
2^2n-2  ~^2n~3 )  ^iuring  the  reconstruction  process. 

IV.A.  Prediction  in  the  high-pass  process  of  binomial 
decomposition  -  Sj2+P  and  B+P  [10,11.121 

There  is  a  subset  of  binomial  based  filters  whose  filter 
bank  posses  a  property  of 

(-l)'^2«.l  =(-l)'^-2«+l  This  subset  system  can  be 
described  by  B+P  transform.  The  reconstruction  process 
can  be  arranged  by  inverse  sequential  operation  or  designing 
a  biorthogonal  based  low-pass  to  satisfy  the  condition 

YKK-21  =  Ao  M  subset  includes  whole 

n 

point  symmetric  (WPS)  filters  (i.e.,  k2u-n  -K  ^ 
constant  ^eZ^nd  V/z)-  Half  point  symmetric  (HPS) 
filters  (i.e.,  ^2«+i-n  ~ ^  constant  Z  and  V/i )  and 
other  non-binominal  filters  cannot  be  directly  expanded 
from  the  triplet  system. 
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We  can  add  an  estimation  term  onto  Eq.  (16),  as  turning 
Eq  (2)  to  Eq.  (7)  in  the  doublet  system,  to  further  reduce  the 
first-order  entropy: 

=  (2^2n  “  ■^2n-I  “  -*^2n+l  )  +  \p^'n  +  1  /  2  J  = 

<+LG<+I/2J  ...(19) 

The  estimation  term,  Ge„,  in  the  high-pass  of  Sb+P 

transform,  d'„,  can  be  expressed  by  its  neighbors  and 
associated  values  in  the  decomposed  data  sequence.  In  fact, 
Ge„  shares  the  same  form  of  Ge„  given  by  Eq.  (9).  The 

reconstruction  process  for  d'„  shares  the  same  form  of  d„  as 
shown  in  Eq.  (12).  However,  the  new  expressions  of 
Ge„  and  d'„  contain  the  coefficients  of  Sb+P  decomposition 
rather  than  the  coefficients  of  S+P  decomposition.  The  non¬ 
truncation  process  of  the  above  derivations  (i.e.,  Eqs.  (15)- 
(19))  is  called  B+P  transform  herein. 

The  decomposed  coefficients  in  the  low-pass  process  are 
multiplied  by  a  scaling  a  factor  of 

~  ^  )•  In  high-pass  domains,  the  scaling  factor 

is  the  main  difference  between  these  two  binomial  systems 
(i.e.,  d'„  =7„'  =  2-\/2r' ).  Again,  due  to  the  requirement  of 
integer  computing  in  the  S,  family,  their  scaling  factors  are 
different  in  these  two  versions  of  binomial  decompositions. 

Eq.  (15)  and  its  non-truncation  version  represent  the  low- 
pass  decompositions  for  Sb+P  and  B-hP  transforms, 
respectively.  The  general  form  of  the  high-pass  component 
in  a  binomial  system  follows  the  non-truncation  version  of 
Eq.  (19). 


IV.B.  Reformulating  m’-tao  biorthoeonal  wavelet 
coefficients.  ( ,  u=-m....0....m.  and  m=  (m-lV2i 


Based  upon  the  symmetric  indices  used  in  general 
binomial  filter,  we  turn  Eq.  (9)  into  a  locally  symmetric 
operation  to  facilitate  the  study  between  the  predictive 
values  and  triplet-type  filter  coefficients.  A  generalized 
predictive  expression  for  the  B+P  transform  is  given  by: 


t'=t'  +  Ge„ 


...(20) 


'^here  Gc;  =S(  +  'x 

r  i=-m 


j=-m 


m  =  (m  -I- 1)  /  2 .  A  reduced  polynomial  expression  using  the 


first-order  terms  is 


i=-m  ;=-m  (21) 

For  Sb+P  transform,  b'„  and  d'„  would  replace  T„'  and 
t„' ,  respectively.  The  low-pass  wing  of  a  B+P  transform  is 

the  same  as  the  binomial  low-pass  component  that  is  the 
non-truncation  version  of  Eq.  (14).  Providing  an  m’-tap 
filter  coefficients,  a  set  of  a'  and  /S'  can  be  found  so  that 

the  high-pass  process  of  a  B-i-P  transform  can  be  made 
equivalent  to  those  of  an  m’-tap  biorthogonal  wavelet.  This 
statement  can  be  proven  as  shown  below.  We  arrange  the 


filter  length  (m’  or  m’+2)  and  length  of  contribution  weights 
(m),  so  that  m=  (m’-l)/2  if  mod(m’+l,  4):=0;  otherwise 
m=(m’+l)/2.  In  the  latter  case,  we  add  two  terms 
-  ^-m  =  0 .  so  that  the  m’-tap  becomes  an  m’+2-tap  filter 
in  the  following  derivation. 


<i 

J,  J,  0,...  0,  0,  0,  0,  0,  0,  0... 

)xa-m 

+( 

...  0,  0.  0,  0... 

)xal, 

+( 

•••  0,  0,  i  1,  i  0.  0... 

)xao 

+( 

...  0,  0,  0,  0,  4-. 

4  2  4 

)xa{ 

+( 

p 

p 

p 

p 

p 

p 

p 

p 

III,/ 

4’  2’ 

+(-l, 

2,-1,  0,...  0,  0,  0.  0,  0,  0,  0... 

X 

31 

+( 

...-1,  2-1,  0,  0,  0,  0... 

+( 

...  0,  0,-1,  2,-1,  0,  0... 

)x^o 

+( 

...  0,  0,  0,  0,-1,  2,-1... 

)xp{ 

+( 

p 

P 

p 

P 

p 

P 

p 

p 

"’l 

-1,  2,-1)  x/5^ 

=(-*-„ 

. -kJxC' 

...(22) 

where  C’  is  the  scaling  factor  mentioned  earlier  and 

=  hQ+  X^“I)“fi2i(+i  -  Let^'  be  unity  for  the 
/  L  tt=-m  J 

central  term  of  second  part  of  the  summation  representing 
7„' .  The  solution  would  be  gTo  =  2itQCo  -  4 ,  where 

Cq  =  ^  Hq 

/  L  u=-m  ^ 

The  other  contribution  weights  are 
^1+1  ^2i+2)Q]~ 

and 

^  [(^2  j  2^2 ;+l  f  ^2 y+2)^0  ^ P'j' 

Since  C’  serves  as  a  scaling  factor  between  two 
decomposition  systems,  one  can  use  it  to  adjust  the  range  of 
decomposed  coefficients  for  different  applications.  From 
Eq.  (23),  we  can  obtain  that  =  2C''Z(-l)^k  •  Since 

i  u 

=0  &s  the  property  of  zero-mean  filtering  in  a 


...(23) 


biorthogonal  wavelet  system,  vanishes  [10].  The 

i 

above  formulation  does  not  meet  the  PR  criterion  through 
the  sequential  decomposition  and  reconstruction  embedded 
in  Eqs.  (19)  and  (20).  With  a  corresponding  low-pass 
component  (see  section  VI),  the  inverse  transformation  of 
the  B+P  system  is  perfectly  reconstructable  through  its 
biorthogonal  synthetic  process.  The  length  of  synthetic  filter 
and  its  coefficients  usually  differ  from  those  of  the 
decomposition  (analytic)  filter. 
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If  we  let  be  unity  for  the  last  term  of  the  summation 

representing  The  remaining  weights  (i.e.,  a'  and 
as  well  as  the  C’  value  can  be  solved  in  terms  of  the  filter 
coefficients  (i.e.,  k^)  of  the  m-tap  triplet-based 

transformation.  A  set  the  predictive  weights  can  be 
obtained: 

C'yn=^l{K^K-\)^ 

“  [(^2i  ~  2^2i+1  ^2i+2)^m  ]“  ®i+l  ...(24) 

and 

P'  =  [{k^j  +2*2,.,,  +*2;+2)C^/4]-,0;.„. 

With  this  decomposition,  the  sequential  operation  is 
reversible.  Unfortunately,  the  above  formulation  would 
produce  high  number  coefficients  particularly  in  a  long  filter 
system.  Because  biorthogonal  wavelets  possess  a  property 
of  |*+„|>|A:+„±p|  for  ttxp>0.  In  [12],  |*o/^±m|  ranges 

from  6  for  5-tap  to  90  for  13-tap  while  designing  effective 
biorthogonal  filter  coefficients. 

Based  on  Eq.  (23),  an  approximation  can  be  made  by 
folding  the  filter  coefficients  for  sequential  reconstruction 
required  by  the  Sj,  family.  We  assume  that  the  data  sequence 
is  symmetric  as  a  mirror  for  each  convolution  operation. 
This  assumption  is  false  in  reality.  However,  it  still  can 
produce  a  good  prediction  for  Eq.  (20)  as  an  approximation. 
With  this  assumption,  the  property  of  perfection 
reconstruction  resumes  in  Eq.  (19)  through  the  sequential 
operation.  Folding  the  filter  coefficient  is  also  equivalent  to 
turning  the  triplet  system  into  doublet  system  so  that 
=  2k_^  and  K^=0  for  u=l,..  m  and  Kq  =  This 
approximation  also  alters  a  non-causal  filtering  process  onto 
a  causal  process  that  is  required  in  an  based  transform  for 
the  purposes  of  perfect  reconstruction.  The  contribution 
weights  with  negative  indices  in  Eq.  (24)  become 

=4/^0’ 

a^=*oC^/16  and 

-  2/:_2m  +  -  <  .(25) 

and 

PLj.,  =  +2/^-22_1  +^^-2;-2)Co/4]-yS:,.. 

Other  weights  with  positive  indices  vanish  (i.e,,  a' =  P'j-O 
for  i>0orj>0). 

The  above  three  equations,  each  as  a  high-pass 
component  of  the  system,  serve  different  purposes:  Eq.  (23) 
is  used  in  B+P  (and  B+PC  to  be  discussed)  transform;  Eq. 
(24)  can  be  used  for  Sb+P  and  B+P  transforms;  and  Eq.  (25) 
is  an  approximation  version  for  the  Sb+P  and  B+P 
transforms.  Strictly  speaking,  only  Eq.  (23)  shows  the 
relationship  between  the  high-pass  components  of  B+P  and 
the  biorthogonal  wavelet  systems.  Eqs.  (24)  and  (25)  are 
used  for  the  purposes  of  sequential  decomposition  with 
reversible  process.  In  a  generalized  B+P  transformation 
(i.e.,  Eq.  (20)  or  (21)),  the  PR  property  may  or  may  not  exist 
due  to  the  non-causal  characteristics  of  the  filters. 


V.  Haar  Plus  Prediction  and  Composite  (H+PC 


Transforms) 

In  Section  III,  we  showed  that  the  high-pass  process  of 
an  orthogonal  wavelet  transform  can  be  computed  through 
Haar  basis  (i.e.,  Eqs.  (13)  and  (14)).  The  high-pass  process 
actually  computes  extended  difference  values  summarized  in 
Eq.  (8^  With  a  computer  implementation,  each  transform 
coefficient  must  be  stored  in  a  real  number.  Therefore,  the 
truncation  and  approximated  rational  values  used  in  Section 
III  should  be  abandoned  when  a  prediction  is  used  to 
compute  the  decomposed  coefficients  of  an  orthogonal 
wavelet  transform.  Similar  to  Eq.  (14),  the  low-pass  process 
of  an  orthogonal  dyadic  wavelet  transform  can  be  computed 
through  Haar  bases  and  becomes  H+PC  that  stands  for 
generalized  prediction  including  both  the  high-pass  and  low- 
pass  processes.  The  low-pass  component  in  H+PC  is: 


/„  =  L  +  a„ 


■■tn  + 


...(26) 


where  a^  is  the  added  composite  value.  The  use  of 
polynomial  terms  for  the  generalization  of  composite  value 
would  increase  the  complexity  of  the  system.  The  PR 
property  of  each  system  would  require  further  investigation. 
Additionally,  its  application  is  not  observed  in  the  low-pass 
component.  To  show  that  the  low-pass  of  orthogonal 
wavelets  is  a  subset  of  H+PC  systems,  we  let 
( . )xri 


+(  0,  0,  |...)xn 


+  (  4,  0- 

V  2’  2  ^  ...(27) 

+  ( . )xiy 

+  (  0,  0,-1,  l...)x/l, 

+  (-l,  1,  0,  0...)xyio 

={hQ,hi,h2,h^—) 

The  solution  for  the  above  equation  is  straightforward: 

Yi  =  {h2i  +  h2i+i)  and  i,- =(/i2y+i for  i&j 
=0,...,m/2.  ...(28) 

It  is  seen  that  the  main  contribution  to  the  low-pass 

filtering  comes  from  (  0,  0...)xyo*  When  it 

convolves  with  the  data  sequence,  the  result  is  IqYq  / V2  . 


Since  neither  nor  values  are  maintained  in  a  computer 

implementation,  the  inverse  sequential  reconstruction  using 
Eq.  (11)  is  no  longer  a  valid  method.  Eqs.  (14)  and  (28) 
show  methods  to  convert  decomposed  Haar  transform 
coefficients  (scaled)  to  transform  coefficients  of  a  PR  two- 
band  transform  [2]  (including  dyadic  orthogonal  wavelets). 
The  inverse  transformation  of  the  transformed  coefficients 
should  follow  its  corresponding  dyadic  inverse  transform 
operation  (e.g.,  inverse  wavelet  transform).  In  contrast  to 
the  high-pass  process,  the  summation  of  weights  contributed 

by  difference  values  is  0  (i.e.,  =— 

j  2  « 

other  words,  the  low-pass  coefficients  can  be  made  not  only 
by  the  average  values  at  adjacent  pixels  but  also  can  be 
contributed  from  the  composed  difference  values  (i.e.,  or 
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).  For  the  purpose  of  entropy  reduction,  these  additional 


contributions  may  not  be  necessary.  However,  they  are 
embedded  in  the  low-pass  process  of  wavelet  transformation 
and  dyadic  decomposition  for  the  purposes  of  perfect 
reconstruction  using  convolution  operation.  Since  the 
compression  relies  on  a  good  prediction  in  high-pass 
domain,  usually  the  low-pass  domain  is  further  decomposed 
into  a  multi-level  dyadic  decomposition  to  obtain  a  greater 
number  of  high  frequency  elements. 

From  the  above  derivations,  we  find  that  the  two-band,  H+P, 
and  S+P  transforms  are  special  cases  of  the  H+PC 
transform.  Figures  1,  2,  and  3  summarize  the  forward  and 
inverse  processes  in  the  three  transformation  systems  and  the 
relationships  among  them.  These  three  figures  show  that  the 
decomposition  procedure  can  be  computed  through  Haar 
transform.  The  decomposed  coefficients  in  discrete  Haar 
transform  can  then  be  converted  into  a  doublet-type  dyadic 
transform  through  a  corresponding  prediction  method 
discussed  above. 


Haar 

transform 


Truncation 


Inverse  S 
transform 


^  transform  - >■ 


Prediction  ^  Reverse 
prediction 


Figure  1.  The  decomposition  and  reconstruction  processes  of  an 
S+P  transform  [6],  which  is  a  tmncation  version  of  an  H+P 
transform.  Bold  and  plain  lines  represent  the  forward  and  inverse 
transforms,  respectively.  The  jointed  plane  line  indicates  a 
composition  of  two  sources  of  data.  stands  for  a  convolution 

process  with  a  segment  of  the  data. 
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Figure  2.  The  decomposition  and  reconstruction  processes  of  an 
H+P  transform. 


through  a  binomial  basis  decomposition  (i.e.,  B+P 
transform).  Similar  to  Eq.  (28),  the  low-pass  coefficients 
can  be  converted  from  the  decomjx)sed  coefficients  of  the 
binomial  filtering  to  B+PC  transform.  By  replacing  a'  with 

Yi  and  p'.  and  A'j  in  Eq.  (22),  the  solution  for  the  weights 

in  terms  of  the  filter  coefficients  ( )  are: 

Y\mi2\  =  +2*„)  and  -2kJ/4 , 

jy-x  =  Yi  =  (.2k  2i  +  k2i+i  +  A:2,_i  )  -  Vm 
\^-j=^j  =(hj^l  -2^2;  +*22-|)/4-/1;>, 
for  i&j=0,...,(m’2)/2.  ...(29) 

For  most  binomial-based  decomposition  systems,  the 
main  contribution  to  the  low-pass  filtering  comes  from  the 
central  filtering 

component:  (...0,  0,  1,  0,  0...)  Again,  the 

inverse  transformation  using  the  transformed  coefficients 
should  follow  its  corresponding  dyadic  inverse  transform 
operation  (e.g.,  inverse  biorthogonal  wavelet).  In  addition, 
the  summation  of  weights  contributed  by  difference  values  is 

0  (i.e.,  =  0 )  low-pass  process. 

7  4,, 

Similar  to  the  doublet  system,  an  additional  composite  term 
in  the  low-pass  wing  of  the  B+P  is  added  to  form  B+PC. 
Eqs.  (24)  and  (29)  indicate  that  a  unique  set  of  a',Pj,Y' 

and  can  be  found  in  the  B+PC  system  to  match  a 
binomial-based  wavelet  or  a  triplet  two-band  system. 

The  relationships  between  binomial,  B+PC,  B+P,  Sb+P, 
triplet  two-band  and  binomial-based  wavelet  (i.e.,  wavelets 
that  satisfy  (-l)''i2«+i  =(-!)" *-2n+i  including  WPS- 
biorthogonal  transforms)  are  exactly  the  same  as  their 
counterparts  developed  through  Haar  transform.  In  other 
words,  similar  system  processing  diagrams,  as  those  shown 
in  Figures  1,  2,  and  3,  can  be  applied  to  their  binomial 
versions  by  replacing  H  with  B,  S  with  85,  doublet  with 

triplet,  and  wavelet  with  binomial-based  wavelet  processes, 
respectively. 
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Figure  3.  The  decomposition  and  reconstruction  processes  of  a 
dyadic  PR  transform  [2]  (e.g.,  orthogonal  wavelet).  The 
decomposition  can  be  generated  from  an  H+PC  transform. 

VI.  Binomial  Plus  Prediction  and  Composite  (B+PC 
Transforms) 

In  Section  IV,  Eq.  (23)  shows  that  the  high-pass  process 
of  a  biorthogonal  wavelet  transform  can  be  computed 


VIL  Switching  Transformation  Kernels  in  Dyadic 
Decomposition  -  An  Adaptive  Approach 

Since  a  data  string  can  contain  varieties  of  data  patterns 
at  different  data  segments,  one  may  wish  to  use  different 
decomposition  kernels  for  the  treatment.  This  can  be 
performed  by  using  an  S+P  and/or  an  Sb+P  transform  to 
another  one  (so  as  to  H+P  and  B+P  decomp)ositions)  because 
the  starting  and  ending  processes  of  a  data  segment  using 
either  method  are  systematically  the  same.  However,  the 
transition  from  the  first  to  the  second  transform  is  the 
subject  of  the  algorithm  modification.  In  addition,  it  is 
necessary  to  record  the  starting  and  ending  points  of  a 
specific  S+P  or  Sb+P  transform.  This  overhead  can  be 
reduced  if  the  corresponding  decomposed  data  possess  a 
marker  and  applied  transforms  are  pre-registered.  The 
following  algorithms  show  application  of  an  S+P  transform 
on  a  data  segment:  (x,,  f  =  0,..Af-l),  and  an  Sb+P 
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transform  on  the  following  data  segment 
(a:^,  i  =  M,,.N  -1) .  For  the  first  segment  decomposition 


using  an  S+P  transform: 

(■.=L(*z.+^a„,)/2j  forn=0,l....(M/2)-l,  .430) 


y.  and  Ay  to  zero  and  C  to  unity,  and  (iii)  use  of 
approximated  but  sufficiently  accurate  rational  values  of  or,- 
and  Pj  for  the  purposes  of  fast  computation. 


^=x^x-x^  forn=Qi..ip2\buildupterri\ 

/orn=(/n'2)+l,...(M/^-l , 

where  m  is  the  total  length  of  contribution  weights  in  . 
For  the  second  segment  decomposition  using  an  Sb+P 


transform: 

K  =  L(-^2«-i  +  2rr2n  +  ^2n+i )/ 4  J 


for  n  =  (M/2),...,(N/2)-l, 
...(32) 


\buiMpteri\ 


...(33) 


where  m’  is  the  length  of  high-pass  kernel  in  d'^  .  The 
reconstruction  for  both  transforms  are  almost  independent 


except  is  shared.  Care  must  be  paid  on  the  overlapped 
convolution  elements.  In  case  that  a  process  starts  from  an 
S+P  to  another  S+P  transform  (or  from  an  Sb+P  to  another 
Sb+P  transform),  the  build-up  coefficients  can  be  ignored  in 
the  second  transform.  The  above  derivations  are  also 
applicable  to  the  corresponding  H+P  and  B+P  transforms  by 
turning  all  the  truncation  operations  off  and  using  real 
computation. 

In  digital  implementation,  however,  a  sequentially 
decomposed  data  sequence  with  multiple  wavelets  or  sub¬ 
band  using  exactly  the  same  length  of  the  data  space  for 
perfect  reconstruction  is  also  solvable  by  treating  each 
segment  independently.  This  can  be  done  by  using  mirrored 
data  extension  for  each  segment  for  each  convolution  based 
computation.  This  algorithm,  however,  does  not  seem  as 
naturally  performed  as  those  derived  in  Eqs.  (30)-(33).  If 
the  application  does  not  require  the  same  total  length  of  the 
data  sequence,  additional  data  space  (i.e.,  the  length  of  each 
kernel)  can  be  provided  to  accommodate  the  joint  data 
between  two  decomposition  processes. 


VIII.  Results  of  Predictive  and  Composite  Terms 
Converting  from  Known  Filters 


VIII.  A.  Filters  in  the  Doublet  System 


By  replacing  filter  coefficients  (/i„)  of  Daubechies 


VIII.B.  Filters  in  the  Triolet  System 


There  are  several  known  biorthogonal  filters  and  2-band 


filters  proposed  in  the  literature.  Typically,  the  reported 
filters  are  quadrapole.  The  analytic  filters  are  composed  of 


{k^)  for  low-pass  and  ((-1)'“^^^^)  for  high-pass.  The 


synthetic  filters  are  composed  of  {k^)  for  low-pass  and 


((-l)l“l^^)  for  high-pass.  By  replacing  filter  coefficients 
( )  into  Eq.  (23),  the  corresponding  composite  weights  (i.e., 
y'i  and  A'-  values)  can  be  obtained.  The  predictive  weights 
( a'  and  yS'  values)  can  be  obtained  by  using  the  synthetic 

filter  coefficients  (^„)  for  Eq.  (29).  Tables  III- VII  shows 
five  sets  of  original  filters  and  their  conversions  in  predictive 
and  composite  weights  associated  with  the  B+PC  transforms. 
Table  VIII  shows  the  composite  and  predictive  weights  of  a 
PR  quadrature  mirror  filter  proposed  in  [18]. 


IX.  A  Unified  Perspective  of  Dyadic  Decomposition  - 
Summary 

The  split  pair  (i.e.,  (1,0)  and  (0,1))  of  delta  functions, 
called  singlet  basis  system,  has  the  lowest  basis  of  dyadic 
decomposition.  The  corresponding  decomposition  forms 
are: 

D+PC  (delta+  prediction  D+P  Transforms 

&  composite)  Transforms 

first-path:  =  X2n  +  )  ==  ^2n  -(34) 

and 

second-path:  t'„  =X2^i +e  =  ^2n-fi  +  «'(-*2«-n±i ) 

...(35) 

where  i  =  -p,. . .,  0,  . . .,  q;  and  e"  are  the  added  process  and 
prediction  terms,  respectively.  The  well-known  DPCM  is  a 
special  case  of  the  D+PC  transform  by  giving  a''(.)=  -x^^^and 
.  In  many  applications,  spline  interpolation 
methods  (called  Sd+P  transform  in  this  paper)  are  used  for 
the  prediction  of  e"'.  Using  high-order  polynomial  terms  to 
model  the  data  sequential  can  be  accomplished  by  an  iterative 
search  using  orthogonal  least  square  (OLS)  method  [19]  or 


wavelets  [9]  into  Eq.  (14),  ^  and  values  can  be 

obtained.  The  second  column  of  Table  II  shows  nine  sets  of 
and  Pj  values  as  H+P  transforms  corresponding  to  the 

high-pass  processes  of  Daubechies  wavelets.  Using  the 
same  filter  coefficients  for  Eq.  (28),  the  right  column  on 
Table  II  shows  nine  sets  of  j/.  and  associated  with  the 

low-pass  components  of  H+PC  transforms.  Substituting  the 
values  of  C,  and  Pj  into  Eq,  (8)  and  the  values  of 

and  2,.  into  Eq.  (26),  it  would  produce  exactly  the  same 

high-pass  and  low-pass  components,  respectively,  as 
Daubechies  wavelets  would.  In  S+P  transforms,  however, 
approximation  is  made  by  (i)  downward  truncation,  (ii)  set 


other  modeling  techniques  [20].  Although  it  takes 
substantially  modeling  effort  for  each  image  pattern,  the 
polynomial  prediction  is  one  of  interesting  data 
decomposition  schemes  in  the  recent  publications  [21].  The 
non-linear  polynomial  terms  can  also  be  added  onto  H+PC 
and  B+PC  systems  as  shown  in  Eqs.  (10)  and  (21),  their 
applications  in  data  decomposition  should  be  of  interested  to 
the  investigators  in  the  field. 

By  comparing  three  generalized  expressions  (i.e.,  Eqs. 
(10),  (20)  and  (35)),  we  find  that  both  doublet  and  triplet 
systems  can  be  formed  by  the  generalized  singlet 
decomposition  system.  Although  the  doublets  and  triplets 
seem  to  function  independently,  they  share  exactly  the  same 
decomposition  principles.  We  can  integrate  major  dyadic 
decomposition  methods  through  a  unified  framework. 


(Submitted  to  IEEE  Signal  Processing  for  Review)8 


Table  11.  The  Weights  of  Predictive  and  Composite  Terms  for  the  Daubechies  Filter  Coefficients 


Names 

C  Indices  (X  P 

Indices  Y  ^ 

D4 

1.515749527851  0  -0.535898384862  1.000000000000 

-1  0.535898384862  0.071796769725 

0  1.319479216883  -0.176776695297 

1  0.094734345491  0.176776695297 

D6 

1.755060181656  0  -0.832286317816  1.000000000000 

-1  1.044065157711  0.285080113551 

-2  -0.211778839890  -0.044065157715 

0  1.139562062261  -0.237110478180 

1  0.324866482108  0.297444261064 

2  -0.050214982000  -0.060333782882 

D8 

2.115899710319  0  -1.025087303111  1.000000000000 

-1  1.394091283712  0.637834792253 

-2  -0.461004174828  -0.165244816522 

-3  0.092000194228  0.023577057747 

0  0.945224383862  -0.242234378622 

1  0.602896998513  0.329432268674 

2  -0.156193429883  -0.108938096777 

3  0.022285609882  0.021740206726 

DIO 

2.618035204426  0  -1.161692571582  1.000000000000 

-1  1.533855467064  1.129337492784 

-2  -0.549918340455  -0.359377373963 

-3  0.219425342839  0.093372230314 

-4  -0.041669897860  -0.012101902702 

0  0.763931667771  -0.221863435912 

1  0.862736674339  0.292940191269 

2  -0.274539756651  -0.105025008740 

3  0.071330003627  0.041906492027 

4  -0.009245026714  -0.007958238642 

D12 

3.299433666451  0  -1.263957432420  1.000000000000 

-1  1.438168880348  1.759232063963 

-2  -0.318388177157  -0.587351260219 

-3  0.230890210880  0.206254974567 

-4  -0.106030209385  -0.051187739089 

-5  0.019316727734  0.006103880398 

0  0.606164633748  -0.191541573524 

1  1.066384259730  0.217941778156 

2  -0.356031561532  -0.048248913199 

3  0.1 2502447 1117  0.034989370028 

4  -0.031028197117  -0.016067940760 

5  0.003699956426  0.002927279298 

bi4 

4.215928263960  -0  -1.343562649551  1.000000000000 

-1  1.093400166579  2.527268506668 

-2  0.337823095148  -0.775608936892 

-3  -0.039222424363  0.320245765170 

-4  0.230208564525  0.045227203738 

-5  0.051103039635  0.027362589736 

-6  -0.009086819972  -0.003052177979 

0  0.474391373567  -0.159343632698 

1  1.198914378251  0.129674901720 

2  -0.367942188923  0.040065090533 

3  0.151921828418  -0.004651694942 

4  0.021455395304  0.027302239283 

5  0.012980576529  0.006060710291 

6  -0.001447926904  -0.001077677252 

D16 

5.445326519367  0  -1.407375942321  1.000000000000 

-1  0.491582583521  3.433238673897 

-2  1.460362721375  -0.816376007316 

-3  -0.698498943693  0.351822304627 

-4  0.145493422954  -0.167328226846 

-5  0.028505924229  0.061878299970 

-6  -0.024387508070  -0.014326908277 

-7  0.004317742010  0.001519171558 

0  0.367287433157  -0.129227874336 

1  1.260985419951  0.045138026321 

2  -0.299844648218  0.134093218853 

3  0.129219911194  -0.064137471023 

4  -0.061457554933  0.013359476464 

5  0.022727121964  0.002617466935 

6  -0.005262093366  -0.002239306310 

7  0.000557972622  0.000396463095 

D18 

7.094396788531  0  -1.459719865014  1.000000000000 

-1  -0.372207203729  4.476958828200 

-2  3.025555678304  -0.567822747103 

-3  -1.740833956566  0.183390036787 

-4  0.697794301300  -0.130916974640 

-5  -0.156862185552  0.080211411183 

-6  -0.003133602695  -0.031941487319 

-7  0.01 1 473492089  0.00737 1 194069 

-8  -0.002066672340  -0.000754190669 

0  0.281912621977  -0.102878363625 

1  1.262111201741  -0.026232477180 

2  -0.160076399454  0.213235583552 

3  0.051699966115  -0.122690766280 

4  -0.036907147582  0.049179255270 

5  0.022612609239  -0.011055357504 

6  -0.009004708440  -0.000220850538 

7  0.002078032647  0.000808630560 

8  -0.000212615869  -0.000145655255 

D20 

9.308956243593  0  -1.503459195971  1.000000000000 

-1  -1.501142274445  5.658263936561 

-2  4.943230475727  0.145805806185 

-3  -3.009730910234  -0.319189839154 

-4  1.530871927260  0.100830971613 

-5  -0.583394250868  0.017478204114 

-6  0.133487910271  -0.033170705790 

-7  -0.005557945353  0.015768241034 

-8  -0.005300425106  -0.003734397410 

-9  0.000994688719  0.000373868474 

0  0.214846857979  -0.080753571089 

1  1.215660228386  -0.080628925261 

2  0.031325919334  0.265509383994 

3  -0.068576934041  -0.161657807356 

4  0.021663217438  0.082225755885 

5  0.003755137237  -0.031335105440 

6  -0.007126621916  0.007169864525 

7  0.003387757042  -0.000298526774 

8  -0.000802323550  -0.000284694920 

9  0.000080324467  0.000053426437 

Remarkl:  The  rational  values  for  each  set  of  and  Pj  for  the  implementation  of  S+P  transform  can  be  adjusted  based  on 


the  dynamic  range  of  the  applied  data  sequence  if  the  approximation  raises  a  concern.  For  data  decomposition,  however,  any 
set  of  these  weights  would  produce  a  PR  result  anyway. 

Remark  2:  The  C  values  do  not  apply  to  y.  and  Aj ,  which  have  been  discussed  in  Section  V. 
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Figure  4  shows  the  unified  perspective  of  dyadic 
decomposition  subsystems.  This  diagram  also  summarizes 
the  dyadic  decomposition  systems  discussed  in  this  paper. 
The  relationships  among  major  decomposition  methods, 
which  are  of  interest  to  many  investigators  in  current  data 
compression  research,  are  shown  in  Table  IX. 

In  this  paper,  we  attempt  to  use  prediction  as  a  key 
expansion  to  generalize  decomposition  of  a  data  sequence. 
Data  decomposition  through  a  prediction  could  produce  low 
entropy  and  result  in  high  compression  efficiency.  As  far  as 
data  compression  is  concerned,  all  dyadic  decomposition 
methods  can  be  unified  by  the  following  three  statements: 


(a)  The  singlet  (i.e.,  (1,0)  and  (0,1)),  Haar  (i.e.,  (1,1)  and  (1,- 
1)  as  a  pair  of  doublets),  and  binomial  bases  (i.e.,  (1,2,1) 
and  (-1,2,-!)  as  a. pair  of  triplets)  are  filter  elements  in  the 
dyadic  decomposition  family. 

(b)  The  low-pass  coefficients  are  weighted  average  values  of 
the  neighbor  pixel  values  and  their  composed  values. 

(c)  The  high-pass  cbefficients  are  the  weighted  difference 
values  of  the  neighbor  pixel  values  and  their  composed 
values. 


Table  III.  A  Spline  Filter  on  Table  I  of  [13]. 


j 

0 

±  1 

±  2 

±  3 

±  4 

K 

0.994369 

0.419845 

-0.176777 

-0.066291 

0.033146 

y'i 

1.966641 

-0.309359 

0.033146 

0.005524 

-0.011049 

0.008286 

K 

0.707107 

0.353553 

a' 

0.000000 

P] 

1.000000 

Co’ =  2.828427 


Table  IV.  A  Spline  Variant  Filter  Proposed  in  [14]  and 
on  Table  II  of  [13]. _ 


u,i,  j 

0 

±  1 

±  2 

±  3 

±  4 

K 

0.852699 

0.377403 

-0.110624 

-0.023849 

0.037829 

y'i 

1.655203 

0.037829 

0.012549 

0.009457 

K 

im^ 

0.418092 

-0.040690 

mg 

a' 

0.201599 

p: 

1.000000 

-0.096803 

Co=  2.28949 


Table  V.  ALaplac. 


ian  Pyramid  Filter  Proposed  in  [15]  and  on  Table  III  of  [13]. 


a 

0 

±  1 

±  2 

±  3 

K 

0.848528 

0.353553 

0.070711 

y'i 

1.555635 

0.070711 

0.035355 

K 

0.858630 

0.368706 

0.075761 

015152 

< 

0.223602 

1.000000 

Co  =  2.459502 


Table  VI.  A  13-11 -Tap  Biorthogonal  Filter  [16]. 


«./,  i 

0 

±  1 

±  2 

±  3 

±  4 

±  5 

±  6 

K 

0.767245 

0.383269 

jj^Hl 

-0.033474 

0.047281 

0.003759 

-0.008473 

y'i 

1.608249 

-0.143345 

0.054799 

-0.008473 

0.009941 

-0.002118 

K 

1.832847 

0.448109 

mm 

-0.108737 

imi 

0,014182 

0.357840 

-0.044704 

Pj 

1.000000 

-0.159502 

0.017548 

Co=  2.025415 


L 
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Table  VII.  A  5-3-Tap  Biorthogonal 


0 

±  1 

±  2 

K 

1.060660 

0.353553 

-0.176777 

■a 

1.767767 

-0.176777 

1 

9' 

ra 

-0.044194 

-0.088388 

/ 

k 

0.707107 

0.353553 

0 

1.000000 

Co=  2.828427 


Filter  [17,  16]. 


Table  VIII.  A  9-tap  Quadrature  Mirror  Filter  (QMF)  [18] 


u .  f ,  j 

0 

±  1 

±  2 

±  3 

±  4 

K 

0.798430 

0.413948 

-0.073882 

-0.060394 

0.028220 

Yi 

1.747113 

-0.194670 

0.028220 

m 

-0.037563 

0.011727 

0.007055 

k 

0.798430 

0.413948 

-0.073882 

-0.060394 

0.028220 

-0.344004 

0.107392 

0.064610 

Pj 

1.000000 

-0.111424 

0.016153 

Co=  2.289491 
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Decomposition 

Systems 
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Doublet  Systems 

Triplet  Systems 

Basis 

g  I 

■  ■  ■ 

.  I  -  ■ 

1(1 

Hierarchical 
Structure  of 

the  systems 

D+PC 

1 — h 

D+P  H+PC  — B+PC 
,Sd+P)  (Doublet  (Triplet 

System)  System) 

•  • 

•  • 

Singlet  decomposition 

H+PC 

Doublet  two-band 

1 

1 - 1 

H+P  wavelets 

(S-P)  ^ 

Haar  wavelet 

B+PC 

Triplet  two-band 

1 - ' - 1 

B+P  WPS  biorthogonal 
;Sb  +P)  wavelets 

Binomial  wavelet 

Figure  4.  A  diagram  illustrates  the  unification  of  major  dyadic  decomposition  systems. 


Table  IX.  Relationships  among  Dyadic  Transforms  in  Data  Decomposition 


Dyadic  Decomposition 
Methods 

Remarks 

H+PC  transform 

Generalized  form  based  on  doublets.  Not  all  of  them  are  PR. 

B+PC  transform 

Generalized  form  based  on  triplets.  Not  all  of  them  are  PR. 

H+P  transform 

Special  cases  of  H+PC.  An  adaptive  transform  can  be  implemented  and 
co-exist  with  B+P.  They  are  PR. 

B+P  transform 

Special  cases  of  B+PC.  An  adaptive  transform  can  be  implemented  and 
co-exist  with  H+P.  Causal  and  some  non-causal  cases  are  PR. 

Discrete  orthogonal 
wavelet  transform 

Special  cases  of  H+PC  transform.  High-pass  can  be  exactly  described  by  H+P 
Predictive  and  composite  terms  of  Daubechies  wavelets  are  shown  in  Table  II. 

Discrete  biorthogonal 
wavelet  transform 

Special  cases  of  B+PC  transform. 

High-pass  can  be  exactly  described  by  a  B+P  transform. 

Two-band 

decomposition 

Special  cases  of  H+PC  or  B+PC  transform. 

S+P  transform 

Special  cases  of  H+P  transform.  An  adaptive  transform  can  be 
implemented  and  co-exist  with  Sk+P.  Rational  computation  is  alwavs  PR 

Sb+P  transform 

Special  cases  of  B+P  transform.  An  adaptive  transform  can  be 
implemented  and  co-exist  with  S+P.  Only  causal  filtering  cases  are  PR. 

X.  Conclusions  and  Discussion 

Although  our  study  aims  at  decomposition  methods  for 
data  compression,  base  transformations  and  their  extensions 
including  wavelet  transforms  as  well  as  their  integer 
versions  are  members  of  the  same  family.  We  also  showed 
that  a  transform  in  the  dyadic  family  can  be  computationally 
constructed  by  its  base  transform.  As  examples,  we  showed 


analytical  and  synthetic  wavelets  can  be  computed  through 
either  Haar  or  binomial  bases.  This  approach,  which  is 
extensively  described  in  the  paper,  can  be  deemed  as  a  part 
of  element  theory  for  the  dyadic  transformations  as  a  whole. 

The  three  transforms  (i.e.,  S+P,  Sb+P,  and  Sj+P)  based 
on  the  sequential  operation  are  appropriate  for  digital  data 
processing.  Since  they  use  rational  numbers  in  the  computer 
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implementation,  the  computational  speed  in  forward  and 
inverse  transforms  can  be  greatly  increased  over 
convolution-based  transforms  using  real  number  for 
implementation.  This  is  particularly  noticeable  in  data 
compression.  In  a  lossy  compression  scheme,  a  quantization 
process  would  be  added  prior  to  the  coding.  The  advantage 
of  data  accuracy  using  real  number  implementation  would 
be  diminished.  Except  the  triplet  system,  S  family 
transforms  would  perform  much  more  effectively  than  real 
number  computation  methods.  They  are  perfectly 
reconstructable  and  can  provide  entropy  as  low  as  their 
counterparts  using  real  number  computation.  Additionally, 
their  adaptive  implementations  are  readily  available. 

As  a  part  of  results  of  this  paper.  Table  II  shows  all 
Daubechies  wavelets  can  be  reformulated  by  the  linear 
prediction  terms  (i.e.,  , Pj ,  Yf  A,. ).  When 

approximating  the  high-pass  process  of  the  decomposition 
using  S+P  system,  and  Xj  set  to  zero  and  C  sets  to  unity. 

In  addition,  andy^ycan  be  approximated  with  rational 

numbers  for  fast  computation. 

Having  drawn  the  unified  perspective,  one  shall  be  able 
to  explore  an  optimal  dyadic  decomposition  (or  wavelet) 
methods  for  a  defined  data  pattern.  Specifically,  we  can 
search  for  a  set  of  solutions  for  the  predictive  contribution 

weights  to  minimize  the  entropy  of  or  .  Using  H-i-GP 
as  example,  a  minimum  entropy  can  be  searched  in  a  2r- 
space  for  a  given  image  pattern.  It  is  observed  that  the 
number  polynomial  order  (r)  and  the  kernel  length  are  in 
single  digits,  a  full  search  using  Eqs.  (1)  and  (9)  through  the 
hierarchical  decomposition  tree  is  practically  possible. 
Since  the  low-resolution  data  sequence  possesses  fewer 
structures,  the  use  of  high-order  polynomial  terms  for 
modeling  optimal  decomposition  in  high-level  tree  may  not 
be  necessary.  This  modeling  method  shares  the  same  spirit 
of  [5]  and  can  be  a  highly  effective  method  in  searching  for 
appropriate  decomposition  kernels.  In  case  of  linear  system, 
the  contribution  weights  can  be  associated  with  a  compactly 
supported  wavelet,  massive  data  patterns  in  detail  structures 
can  be  documented  if  we  find  that  the  result  obtained  from  a 
kernel  is  significantly  different  from  others  [22].  These 
applications  can  be  of  greatly  useful  in  various  fields  of 
digital  signal  processing. 
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Nomenclature: 

Symbols  of  the  three  main  decomposition  systems 
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abbreviation  of  the  three  bases,  namely:  delta,  Haar,  and  binomial  systems, 
low-pass  components  of  the  three  bases, 
high-pass  components  of  the  three  bases. 

integer  versions  of  the  three  basis  systems  using  rational  number  computation. 

integer  versions  of  1",  1,  and  \\  respectively. 

integer  versions  of  t",  t,  and  t’,  respectively. 

three  extended  systems  with  prediction  in  the  high-pass  process. 

added  general  prediction  with  polynomial  terms  in  the  high-pass  process. 

added  linear  prediction  terms  in  the  high-pass  process. 

scaled  low-pass  components  of  the  three  bases. 

scaled  high-pass  components  of  the  three  bases. 

high-pass  components  of  the  three  generalized  prediction  systems. 

weights  of  the  low-pass  component  for  the  prediction  in  polynomial  term. 

weights  of  the  high-pass  component  for  the  prediction  in  r"’  polynomial  term. 

scaling  factors  between  H-fP  and  Haar  and  between  B+P  and  binomial. 
three  sequential  type  (S)  transforms  with  prediction, 

high-pass  components  of  the  three  sequential  type  transforms, 
dyadic  systems  with  generalized  high-pass  and  low-pass  processes, 
added  composite  terms  in  the  low-pass  process. 

low-pass  components  of  the  three  generalized  systems, 
weights  of  the  low-pass  components  in  the  composite  term, 
weights  of  the  high-pass  components  in  the  composite  term. 


Filter  coefficients 

hu  :  orthogonal  wavelet  coefficients. 

K  :  analytic  and  synthetic  filter  coefficients  of  biorthogonal  wavelets 
Ku  :  altered  coefficients  in  biorthogonal  wavelet  for  the  approximation  of  causal  filter  in  B+P. 
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Abstract  A  multiple  circular  path  convolution  neural  network  (MCPCNN)  architecture  specifically  designed  for 
the  analysis  of  tumor  and  tumor-like  structures  has  been  constructed.  We  first  divided  each  suspected  tumor  area 
into  sectors  and  computed  the  defined  mass  features  for  each  sector  independently.  These  sector  features  were  used 
on  the  input  layer  and  are  coordinated  by  convolution  kernels  of  different  sizes  that  propagated  signals  to  the  second 
layer  in  the  neural  network  system.  The  convolution  kernels  were  trained,  as  required,  by  presenting  the  training 
cases  to  the  neural  network. 

In  this  study,  randomly  selected  mammograms  were  processed  by  a  dual  morphological  enhancement 
technique.  Radiodense  areas  were  isolated  and  were  delineated  using  a  region  growing  algorithm.  The  boundary  of 
each  region  of  interest  was  then  divided  into  36  sectors  using  36  equi-angular  dividers  radiated  from  the  center  of 
the  region.  A  total  of  144  BI-RAD  based  features  (i.e.,  4  features  per  sector  for  36  sectors)  were  computed  as  input 
values  for  the  evaluation  of  this  newly  invented  neural  network  system.  The  overall  performance  was  0.78-0.80  for 
the  areas  (Az)  under  the  ROC  curves  using  the  conventional  feed-forward  neural  network  in  the  detection  of 
mammographic  masses.  The  performance  was  markedly  improved  with  Az  values  ranging  from  0.84  to  0.89  using 
the  MCPCNN.  This  paper  does  not  intend  to  claim  a  highest  mass  detection  system.  Instead  it  reports  a  potentially 
better  neural  network  structure  for  analyzing  a  set  of  the  mass  features  defined  by  an  investigator. 

Key  words:  Mammography  masses,  computer-aided  diagnosis,  neural  network,  convolution  neural  network,  sector 
features,  and  BI-RAD. 

1.  Introduction 

It  is  known  that  effective  treatment  of  breast  cancer  calls  for  early  detection  of  cancerous  lesions  (e.g.,  clustered 
microcalcifications  and  masses  associated  with  malignant  cellular  processes)  [1-3].  Breast  masses  appear  as  areas  of 
increased  density  on  mammograms.  It  is  particularly  difficult  for  radiologists  to  detect  and  analyze  a  suspected  area 
where  a  mass  is  overlapped  with  dense  breast  tissue.  These  masses  are  more  readily  seen  as  time  progresses,  but  the 
further  the  tumor  has  progressed,  the  lower  the  possibility  of  a  successful  treatment.  Therefore,  increasing  the 
chances  of  early  breast  cancer  detection  in  improving  today’s  clinical  system  is  of  vital  importance  in  breast  cancer 
diagnosis. 

Several  research  groups  have  developed  computer  algorithms  for  automated  detection  of  mammographic 
masses  [4-8].  Investigators  also  attempted  to  classify  the  malignant  or  benign  nature  of  the  detected  tumors  [9-12]. 
The  results  of  these  detection  programs  indicate  that  a  high  true-positive  (TP)  rate  can  be  obtained  at  the  expense  of 
2  or  3  false-positive  (FP)  detections  per  mammogram,  Mammographically,  a  multiplicity  (more  than  two)  of  similar 
benign-appearing  breast  lesions  argues  strongly  for  benignity  [13-16]  and,  indeed,  the  more  masses  that  are 
identified,  the  less  chance  that  they  represent  cancer  [17].  If  the  computer  indicates  multiple  suspicious  locations  on 
a  mammogram,  the  radiologist  has  to  seek  out  one  mass  that  possesses  mammographic  features,  which  are  different 
from  the  others.  The  significant  lesion  may  be  missed  due  to  the  multiplicity  of  possible  lesions.  We  therefore 
believe  that  a  more  useful  and  fundamental  approach  to  computer-aided  diagnosis  (CADx)  of  masses  is  to  devise 
computer  programs  to  analyze  features  of  a  suspected  area  [18,19]  and  to  provide  feature  measures  and  estimates  of 
the  likelihood  of  malignancy  by  making  comparisons  within  a  digital  mammographic  database.  The  computer 
therefore  serves  as  a  second  opinion  and  also  provides  a  reproducible  and  an  objective  evaluation  of  the  mass.  With 
this  aid,  the  radiologist  may  also  increase  his/her  sensitivity  by  lowering  the  threshold  of  suspicion,  while 
maintaining  the  overall  specificity  and  reading  efficiency. 

2.  Clinical  Background  of  Breast  Lesions  and  Technical  Approach  in  Mass  Detection 
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2.1  Description  of  clinical  background 

Most  commonly,  breast  cancer  presents  itself  as  a  mass.  The  same  lesion  shows  a  somewhat  different 
picture  from  one  projection  to  the  other.  Difficulties  in  masses  also  vary  with  the  underlying  breast  parenchyma.  In 
the  fatty  breast,  masses  are  generally  easy  to  detect.  In  the  dense  breast,  mass  detection  is  more  difficult  and 
auxiliary  signs  aid  this  detection.  When  the  breast  contains  one  mass,  the  decision  process  is  based  on  its  size, 
shape,  and  margins.  When  there  are  several  masses,  one  looks  at  each,  trying  to  determine  whether  any  has  features 
to  suggest  cancer.  Furthermore,  one  looks  to  see  if  any  mass  is  different  in  appearance  from  the  others.  Multiple 
small,  well-defined,  similar  masses  that  present  themselves  bilaterally  are  all  likely  to  be  benign.  Large,  poorly- 
defined,  spiculated  and  unusually  radiodense  masses  are  extremely  likely  to  be  malignant.  In  this  study,  we  used 
several  computational  features  (see  section  3.2)  highly  associated  with  four  major  features  of  breast  masses  routinely 
used  in  clinical  reading: 

’  Malignant  lesions  tend  to  have  greater  radiographic  density  due  to  high  attenuation  and  less 
compressibility  of  cancer  than  normal  tissue.  Radiolucent  lesions  are  typically  benign  and  the  diagnosis 
can  be  made  from  the  mammogram. 

^  -  If  the  lesion  has  morphological  features  suggesting  malignancy,  it  should  be  considered  suspicious 
regardless  of  the  size.  Isolated  masses  with  non-cystic  densities  greater  than  8  mm  in  diameter  can  be 
malignant.  In  general,  the  larger  a  lesion,  the  more  suspicious  it  is. 

Shape  -  The  more  irregular  the  shape  of  a  lesion,  the  more  likely  the  possibility  of  malignancy.  Lesions  tend  to  be 
round,  ovoid  and/or  lobulated.  Small  and  frequent  lobulations  are  suspicious.  Lesions  in  the  lateral  aspect 
of  the  breast  near  the  edge  of  the  parenchyma  with  a  reniform  shape  and  a  hilar  indentation  or  notch  usually 
represent  a  benign  intramammary  lymph  node.  Breast  carcinoma  hidden  in  the  dense  tissues  can  cause 
parenchymal  retraction,  which  possess  different  shapes. 

-The  margins  of  the  lesion  should  be  carefully  evaluated  for  areas  of  spiculation,  stellate  patterns  or  ill-defined 
regions.  Most  breast  cancers  have  ill-defined  margins  secondary  to  tumor  infiltration  and  associated 
fibrosis.  The  appearance  of  spiculations  and  a  more  diffuse  stellate  pattern  are  almost  pathognomonic  for 
cancer.  Lesions  with  sharply  defined  margins  have  a  high  likelihood  of  being  benign;  however,  up  to  7% 
of  malignant  lesions  can  be  well  circumscribed. 

These  are  known  clinical  features  and  have  been  adapted  in  “Breast  Imaging  -  Reporting  and  Data  System” 
(BI-RAD)  [20]  of  the  American  College  of  Radiology  (ACR).  Figures  1(A)  and  1(B)  show  two  breast  images 
containing  masses.  In  Figure  1(A),  a  malignant  mass  is  superimposed  on  the  dense  glandular  tissue.  However,  its 
spiculated  nature  makes  it  easily  identifiable.  In  Figure  1(B),  another  malignant  mass  is  located  on  the  fatty 
background  but  is  associated  with  a  large  body  of  glandular  tissue.  This  mass  is  not  easily  detectable  by  the  computer 
because  its  density  is  lower  than  the  neighboring  glandular  tissue.  Furthermore,  one  end  of  the  mass  is  fully  connected 
with  this  tissue. 

2.2.  Technical  approach  for  detection  of  mammographic  masses 

In  this  study,  our  goal  was  to  detect  clinically  suspicious  lesions.  The  differentiation  of  benign  and  malignant  status 
of  the  mammographic  masses  can  be  extended  from  this  study  model  and  will  be  reported  in  our  future  work.  The  study 
was  conducted  with  the  following  steps:  (1)  use  background  correction  method  and  morphological  operations  to  extract 
radio-opaque  areas,  (2)  delineate  the  boundary  of  the  areas,  (3)  compute  the  features  and  texture  of  the  masses  with 
emphasis  on  the  boundary,  and  (4)  design  training  strategy  using  neural  networks  as  classifiers  for  the  recognition  of 
mass  features.  The  overall  detection  scheme  of  the  study  framework  is  shown  in  Figure  2. 

3.  Development  of  Technical  Methods 

3.1.  Preprocessing  and  Extraction  of  Suspicious  Masses 

In  automatic  mass  detection,  accurate  selection  of  suspected  masses  is  considered  a  critical  first  step  due  to 
the  variability  of  normal  breast  tissue  and  the  lower  contrast  and  ill-defined  margins  of  masses.  In  our  previous  study 
[18],  we  aimed  to  improve  the  task  of  lesion  site  selection  using  model-based  image  processing  techniques  for 
unsupervised  lesion  site  selection.  We  focused  on  two  essential  issues  in  the  stochastic  model-based  image 
segmentation:  enhancement  and  model  selection.  Based  on  the  differential  geometric  characteristics  of  masses 
against  the  background  tissues,  we  proposed  one  type  of  morphological  operation  to  enhance  the  mass  patterns  on 
mammograms  by  removing  high  intensity  background  caused  by  breast  tissues  while  keeping  mass-signals  [18]. 
Then  we  employed  a  finite  generalized  Gaussian  mixture  (FGGM)  distribution  to  model  the  histogram  of  the 
mammograms  where  the  statistical  properties  of  the  pixel  images  are  largely  unknown  and  are  to  be  incorporated. 

We  incorporate  the  EM  algorithm  with  two  information  theoretic  criteria  to  determine  the  optimal  number  of  image 
regions  and  the  kernel  shape  in  the  FGGM  model.  Finally,  we  applied  a  contextual  Bayesian  relaxation  labeling 
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(CBRL)  technique  to  perform  the  selection  of  suspected  masses. 


Figure  1 .  (A)  Dense  breast  containing  a  malignant  mass.  (B)  Fatty  and  glandular  breast  containing  a  malignant  mass 


G  Original  j 
ammogragy 


Figure  2.  A  system  flow  chart  for  the  detection  of  masses  in  this  study. 


We  consistently  processed  the  mammograms  using  this  very  effective  pre-screening  segmentation  method.  In 
that  study  [18],  the  FGGM  method  isolated  1,142  potential  masses  including  114  of  the  186  true  masses  in  200 


mammograms.  The  mammograms  were  collected  from  the  Mammographic  Image  Analysis  Society  (MIAS) 
database  and  Brook  Army  Medical  Center  (BAMC)  database.  After  morphological  enhancement,  3,143  potential 
masses  were  extracted  using  the  FGGM  technique.  Of  them,  181  were  masses;  however,  5  masses  were  not 
extracted.  The  results  demonstrated  that  more  true  masses  were  picked  up  after  enhancement  although  more  false 
cases  were  also  included.  The  undetected  areas  mainly  occurred  at  the  lower  intensity  side  of  the  shaded  objects  or 
more  obscured  by  fibroglandular  tissues  that,  however,  were  extracted  on  morphological  enhanced  mammograms. 
Additionally,  when  the  margins  of  masses  are  ill  defined,  only  parts  of  suspicious  masses  were  extracted  from  the 
original  mammograms.  We  therefore  decided  to  use  the  proposed  morphological  operation  as  a  preprocessing  step 
for  the  image  enhancement  prior  to  a  segmentation  method  for  the  extraction  of  potential  masses  on  the 
mammograms. 

Based  on  the  CBRL  segmented  region  of  interest,  we  employed  a  region  growing  method  using  a  4-neighbors 
connection  method  assisted  with  a  template  masking  operation  to  fill  unconnected  holes  in  the  ROI: 

IF  fix-a,y-b)>V  and  f{x,y)G  S,  then  fix-a,y-b)G  S  ...(1) 

IF  f(x-d,y-d)ES,  then  f{x-t,y-s)ES  fort<dands<d  ...(2) 

where  V  denotes  the  threshold  value  of  the  originally  CBRL  segmented  ROI,  S  represents  the  set  of  growing  region, 
and  {a,  b]  is  a  set  of  four  conditions  (i.e.,  [1, 0],  [-1, 0],  [0, 1],  and  [0,  -1])  for  the  four  neighboring  pixels.  In  eq.  (2),’ 
d  is  the  size  of  template.  In  practice,  we  found  that  d  should  be  set  at  5  pixels  to  fill  the  holes  without  disrupting  the 
boundary. 

3.2.  Feature  extraction  of  the  masses 

Feature  extraction  methods  play  an  essential  role  in  many  pattern  recognition  tasks.  Once  the  features 
associated  with  an  image  pattern  are  extracted  accurately,  they  can  be  used  to  distinguish  one  class  of  patterns  from 
the  others.  Recently,  many  investigators  have  found  that  the  multilayer  perceptron  (MLP)  neural  network  using  the 
error  backpropagation  training  technique  is  a  very  powerful  tool  to  serve  as  a  classifier  [22,  23].  In  fact,  the  use  of 
MLP  neural  network  system  for  classification  of  disease  patterns  has  been  widely  applied  in  the  field  of  computer- 
aided  diagnosis  [24-28]. 

The  success  of  using  a  classifier  for  a  pattern  recognition  task  would  rely  on  two  factors:  (a)  selected 
features  that  could  describe  a  discrepancy  between  image  patterns  and  (b)  accuracy  of  the  feature  computation. 
Should  either  one  fail,  no  analyzer  or  classifier  would  be  able  to  achieve  an  expected  performance.  By  analyzing 
many  clinical  samples  of  various  sizes  of  masses,  we  found  that  the  peripheral  portion  of  the  mass  plays  an 
important  role  for  mammographers  to  make  a  diagnosis.  The  mammographer  usually  evaluates  the  surrounding 
background  of  a  radiodense  area  when  a  region  is  suspected. 

We  used  the  CBRL  segmented  ROI  to  compute  the  center.  Since  the  segmented  ROIs  were  somewhat 
smaller  than  the  mammographer's  delineation  and  on  the  denser  region  of  the  suspected  patch,  the  computed  centers 

were  quite  close  to  the  visual  center.  We  then  divided  the  boundary  of  the  ROI  into  36  sectors  (i.e.,  10°  per  sector) 
using  36  equi-angular  dividers  radiated  from  the  center  of  the  ROI.  The  following  features  were  computed  within 

each  10°  sector  of  the  region: 

(a)  "Z"  -  the  length  from  the  center  of  the  ROI  to  the  boundary  segment  of  the  sector. 

(b)  "a"  -  the  cos(  6  )  (where  6  is  the  normal  angle  of  the  boundary) 

fsj 

(c)  "g”  -  the  average  gradient  of  gray  value  on  the  segment  along  the  radial  direction  (i.e.,  g  =  X— )  where  N 

1=1  iV 

is  the  number  of  pixels  of  i  along  the  radial  direction  from  //3  inside  the  boundary  to  the  boundary  (see 
the  left  //3  line  segment,  Figure  3).  Technically  speaking,  this  set  of  gradient  values  may  also  serve  as  a 
fuzzy  system  on  the  input  layer  in  the  neural  network  (to  be  described  in  Section  3.3). 

(d)  "c"  -  the  gray  value  difference  (i.e.,  contrast)  along  the  radial  direction.  Specifically, 

P  h^  P  b 

C  =  S  where  hi  (or  b„)  represents  a  pixel  value  along  the  radial  direction.  The  position  Z/3 

i=l P  o^\P 

inside  the  boundary  is  the  center  of  pixels  hi  (i=I,2,3,...  P)  and  position  Z/3  outside  the  boundary  is  the 
center  of  pixels  bg  (o=I,2,3,...  P),  and  P  is  the  number  of  pixels  equivalent  to  a  segment  of  Z/6  and  was 
used  for  averaging  (see  Figure  3). 

Hence,  a  total  of  144  computed  features  (4  features/sector  for  36  sectors)  were  used  as  input  values  for  the 
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classification  of  the  ROI.  The  relationship  between  the  computed  features  and  BI-RADS  descriptors  are  discussed 
below: 

(1)  ROISize- 

The  size  of  ROI  is  provided  by  the  36  values. 

(2)  ROI  Shape  (round,  oval,  lobulated,  or  irregular)  - 

The  36  7”  and  36  "a”  values  can  describe  the  shape  of  the  ROI. 

(3)  ROI  Margin  (circumscribed,  microlobulated,  obscured,  ill-defined,  or  spiculate)  - 
The  36  and  36  values  can  describe  the  ROI  margin. 

(4)  ROI  Density  (fat-containing,  low  density,  isodense,  or  highly  dense)  - 

The  36  "c"  and  36  values  can  be  used  to  describe  the  density  distribution  of  the  ROI. 

In  short,  the  selected  features  are  greatly  associated  with  the  main  mass  descriptors  indicated  in  the  BI-RADS.  The 
reason  for  using  36  values  for  each  nominated  feature  is  four-fold:  (a)  mass  boundary  varies,  it  is  difficult  to 
describe  an  image  pattern  using  a  single  value;  (b)  due  to  the  general  shape  of  the  masses,  the  features  of  masses  can 
be  easily  analyzed  by  the  polar  coordinate  system;  (c)  in  case  some  features  are  inaccurately  computed  in  several 
directions  due  to  the  structure  noises,  such  as  the  breast  slender  lines,  there  may  still  exist  a  sufficient  number  of 
correct  features;  (d)  generally  more  accurate  results  can  be  produced  by  using  subdivided  parameters  rather  than 
using  global  parameters  in  a  pattern  recognition  task  when  the  parameters  are  barely  discernable  and  sample  sizes 
are  sufficiently  large.  Other  computational  features  (e.g.,  difference  entropy  [19]  and  other  higher  order  features) 
are  eligible  but  require  further  investigation. 


Figure  3.  A  suspicious  mass  is  delineated  and  shown  as  the  shaded  region.  Contrast  is  computed  by  subtracting  the 
averaged  background  pixel  value  (i.e.,  bo,  o=l,2,..P)  from  the  averaged  foreground  value  (i.e.,  hj,  i=l,2,..P). 


3.3.  The  neural  network  structure  specifically  designed  for  the  extracted  boundary  features 

(A)  Multiple  paths  with  circular  networking  to  instruct  the  neural  network  in  analyzing  sector  features 

This  paper  focuses  on  neural  network  design  and  arrangement  of  features  for  effective  pattern  recognition  of 
ROIs.  We  designed  several  neural  network  connections  between  the  input  and  the  first  hidden  layers  as  shown  in 
Figure  4.  In  this  neural  network  system,  the  first  layer  also  functions  as  a  correlation  layer  that  transforms  and 
encodes  the  signals  from  input  nodes  into  correlation  features  for  further  neural  network  process.  Figure  4(A),  (B), 
and  (C)  illustrate  the  full  connection,  a  self  correlation  (SC)  network,  and  a  neighborhood  correlation  (NC)  network, 

respectively.  Network  connections  with  multiple  sectors  (i.e.,  20^,  30^,  40^,  and  50^  of  the  neighborhood 
correlation)  are  grouped  separately  as  independent  NC  paths.  In  the  following  study,  we  used  four  SC  paths  for  a 
single  sector  and  thirteen  NC  paths  for  four  types  of  multi-sectors.  The  method  of  using  the  multiple  correlation 
connections  was  motivated  by  our  research  experience  in  two-dimensional  convolution  neural  network  (2-D  CNN) 
where  we  found  that  more  than  10  multiple  convolution  kernels  in  the  CNN  were  necessary  in  the  detection  of  lung 
nodules  and  microcalcifications  [25]. 

Compared  to  2-D  CNN  systems,  the  computation  required  in  the  1-D  CNN  (e.g.,  144  input  features)  is  relatively 
small.  The  combination  of  the  networking  paths  described  earlier  for  MCPCNN  was  implemented  using  C 
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programming  language.  The  internal  computation  algorithm  used  in  the  MCPCNN  shares  the  same  convolution 
process  as  that  in  the  2-D  CNN  [25].  Rotation  invariance  and  flip  invariance  for  training  the  1-D  convolution 
kernels  in  the  MCPCNN  were  employed. 


Figure  4.  Three  types  of  network  paths  connecting  the  input  and  the  hidden  layers  in  the  MCPCNN: 

(A)  Full  connection. 

(B)  A  self  correlation  (SC)  path;  each  node  on  the  layer  connects  to  a  single  set  of  the  features  (l,a,g,c)  for  the  fan-in 
and  fully  connects  to  the  hidden  nodes  for  fan-out. 

(C)  A  neighborhood  correlation  (NC)  path;  each  node  on  the  layer  connects  to  the  input  nodes  of  adjacent  sectors 
for  the  fan-in  and  fully  connects  to  the  hidden  nodes  for  fan-out. 

The  fan-in  nets  emphasizing  self  correlation  in  (B)  and  neighborhood  correlation  in  (C't  represent  convolution 
weights  (i.e.,  the  same  type  of  sectors  possess  the  same  set  of  weighting  factors). 

The  fully  connected  neural  network  is  a  conventional  feed-forward  MLP  neural  network.  The  signals  of  the 
fully  connected  neural  network  join  the  other  network  processes  (i.e.,  SC  paths  and  NC  paths)  at  the  single  node  of 
the  output  layer.  The  signal  received  at  the  output  node  is  scaled  between  0  and  1.  During  the  training,  0  and  1  were 
assigned  at  the  output  node  to  perform  backpropagation  computation  for  a  non-mass  and  a  mass,  respectively.  The 
backpropagation  is  computed  in  such  a  way  that  the  computed  incremental  errors  (see  equations  (9)  and  (10)  are 
retraced  into  every  independent  network  path.  Excluding  the  output  layer,  the  SC  and  NC  signals  are  independently 
arranged  and  are  processed  through  two  types  of  one-dimensional  convolution  processes  in  the  forward  propagation. 
The  learning  algorithms  for  all  three  types  of  circular  network  paths  are  based  on  the  backpropagation  training 
method. 

Let  V®(n,  s)  represents  an  input  signal  at  the  node  n  and  sector  j.  The  signal  processed  through  an  SC  path  and 
to  be  received  at  each  node,  n,  on  the  first  hidden  layer  is 

A^ic(«)  =  (3) 

/ 

where  ®  stands  for  convolution  operation,  Wijn’;n)  is  the  weight  factor  connected  to  node  n  from  node  n’ through 
a  self  correlation  path,  i,  regardless  of  the  sector.  The  signal  processed  through  an  NC  path  and  to  be  received  at 
each  node,  n,  on  the  first  hidden  layer  is 
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(4) 


<c('')  =  Zv'>{n’,  s’)  ®  W,,  „  (n'-,n). 

j 

where  is  the  weight  factor  connected  from  node  n'  sector  s'  through  a  neighborhood  correlation  path, 7, 

and  (si,  s2)  is  3i  range  of  sectors  for  the  neighborhood  correlation.  Nevertheless,  the  signals  processed  through  an 
NC  path  and  to  be  received  at  each  node,  n,  on  the  first  hidden  layer  is 

V^{n)  =  s[N^in))  and  N^(n)  =  Np(n)  +  Nscin)  +  A^^c(")  +  ’  (5) 

where  A^]r  (n)  is  the  processed  signal  contribute  by  full  connection,  b°(n)  represents  the  bias  at  node  n  and  S(z)  is  a 
sigmoid  function  given  by 

5(z)  =  - - 

l  +  exp(-  z) 

The  sigmoid  function  would  produce  modulated  values  ranging  from  0  to  1.  The  signals  on  other  hidden  layers  in 
each  path  are  processed  the  same  as  a  conventional  fully  connected  neural  network.  Other  than  the  first  hidden 
layer,  the  receiving  signals  at  a  hidden  layer,  I,  collected  from  the  previous  hidden  layer,  1-1 ,  are  given  by, 


V\n)  =  s(N^{n))=  sf 


(7) 


where  n’ and  n  denote  the  nodes  at  layers  1-1  and  /,  respectively. 

Let  the  ^th  change  of  the  weight  be  AW^p(n\s';n)  and  the  ?-th  change  of  the  bias  be  Ab^t)-  The  error  function  is 
defined  as 

E  =  l{T-Of  (8) 

where  T  and  O  denote  the  target  output  value  and  the  actual  output  value,  respectively  when  the  input  values 
V’{n',s'),  are  entered  in  the  network.  In  this  model,  the  error  backpropagation  algorithm,  which  updates  the  kernel 
weights,  is  given  below: 

r  -  -  \ 


+  = 


Y.Y.Sl^\n\s'\n,s)-Vi;\n,s) 


/+i- 


+  Cl2\W‘[f] 


^b^p[t  +  ^  =  T1Y.Y.S^p\n\s'^n,s)+a^b^p[t] 
n  s 

5^p{n\s\n,s)  =  s\Nl  ^ ’)][^S  X  4^’  (n,  j 

In  the  case  of  the  last  layer, 

S^(n)  =  S’(N^in)\T(n)-0(n)) 
where  S\z),  t],  a,  and  T  denote  the  derivative  of  5(z),  the  learning  rate,  the  weighting  factor  contributed  by  the 
momentum  term,  and  the  desired  output  image,  respectively.  Furthermore,  s  or  s’  =  1  and  p=l  when  1^0. 

During  the  training,  we  added  an  isotropic  constraint  to  the  weights  of  the  1-D  convolution  kernels  so  that 

W^^(n,-s)=W^g(n,s)  (13) 

where  q  is  not  the  fully  connected  path.  These  additional  constraints  are  used  to  induce  the  kernels  functioning  as 
correlation  processing  filters  and  could  facilitate  the  algorithm  in  searching  for  an  appropriate  filter. 


(9) 

(10) 

(11) 

(12) 


(B)  Resampling  the  training  set  through  utilization  of  rotation  and  flip  invariance  of  the  features 

In  this  neural  network  model,  there  are  no  starting  and  ending  sectors.  The  forward  and  backpropagation 
computation  can  start  from  any  sector.  Considering  a  flipped  patch,  the  characteristics  of  mass  feature  should  remain 
the  same.  To  take  advantage  of  this  flip  invariance,  the  same  numerical  target  value  can  be  assigned  at  the  output 
node  for  the  flipped  image  patch  in  order  to  double  the  amount  of  cases  during  training. 

Since  we  designed  a  10^  increment  for  each  rotation,  each  SC  or  NC  path  would  process  through  36  times  using 
the  defined  features  for  each  image  patch.  To  simplify  this  network  computation,  we  shifted  one  small  sector  (4 
nodes)  on  the  input  layer  at  a  time  to  conduct  the  circular  convolution  process  with  the  SC  and  NC  kernels  in  the 
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following  experiments.  By  reversing  the  sequence  of  the  sector,  one  can  train  the  flipped  version  of  the  suspicious 
masses.  Hence,  using  the  properties  of  the  rotation  invariance  and  flip  invariance  for  the  neural  network  training 
literally  increases  the  number  of  the  training  set  by  a  factor  of  72. 

In  summary,  we  have  developed  a  complete  detection  procedure  for  the  automatic  recognition  of 
mammographic  masses  including  background  adjustment,  contrast  enhancement,  ROI  segmentation,  feature 
extraction,  and  MCPCNN  system  with  a  training  method.  Figure  5  shows  a  flow  diagram  for  the  essential  sections 
of  the  computational  procedures. 


Submit  features  to 

ihe  NN  fnr  ^  | 

training  or  testing 


SegmentatiOT 
of  the  mass 


Compute 
l,a,g,c  values 


Rotate  all  setso 
the  features  for 
the  NN  training/  tp. 


Output 

nodes 


Figure  5.  A  schematic  diagram,  showing  the  MCPCNN  and  sector  features  of  masses,  that  was  used  in  the  following 

Study. 


4.  Experiments  and  Results 

The  200  mammograms  were  selected  from  the  Mammographic  Image  Analysis  Society  (MIAS)  database  and 
the  Brook  Army  Medical  Center  (BAMC)  database  created  by  the  Department  of  Radiology  at  Georgetown 
University  Medical  Center.  Of  the  200  mammograms,  50  mammograms  are  normal,  and  each  of  the  150  abnormal 
mammo^ams  contains  at  least  one  mass  case  of  varying  size,  subtlety,  and  location.  Both  the  Cranio-Caudal  (CC) 
and  Medio-Lateral  Oblique  (MLO)  projection  views  were  used.  The  films  were  digitized  with  a  computer  format  of 

2048x2500x12  bits  (for  an  8"^H"  area  where  each  image  pixel  represents  100  ixm  square).  Ninety-one 
mammograms,  either  a  CC  or  an  LMO  view  film,  were  selected  from  91  patient  film  jackets.  No  two  mammograms 
were  fleeted  from  the  same  patient.  All  the  digitized  mammograms  were  miniaturized  to  512x625x12  bits  using 
4x4  pixel  averaging  before  the  method  was  applied.  According  to  radiologists,  the  size  of  small  masses  is  3-15  mm 
in  effective  diameter.  A  3  mm  object  in  an  original  mammogram  occupies  30  pixels  in  a  digitized  image  with  a 
100 /Ml  resolution.  After  reducing  the  image  size  by  four  times,  the  object  will  occupy  the  range  of  about  7-8  pixels. 
The  object  with  the  size  of  7  pixels  is  expected  to  be  detectable  by  any  computer  algorithm.  After  pre-processing 
and  an  object  screening  based  on  the  circularity  test  and  the  size  test  (between  3mm  and  30mm),  a  total  of  125 
suspicious  areas  were  selected  from  the  testing  mammograms  (91  cases)  for  this  study.  Specifically,  the  screening 
procedure  of  reducing  false-positives  involves  two  steps  (a)  image  patches  with  circularity  less  than  0.25  or  diameter 
greater  than  30mm  were  eliminated  (b)  using  probability  modular  neural  network  (PMNN)  to  rule  out  the  majority 
of  false-positives.  Of  the  125  suspicious  areas,  75  ROIs  contained  masses  based  on  corresponding  biopsy  reports 
with  one  experienced  radiologist  reading.  This  set  of  ROIs  was  used  in  [19]  and  discussed  in  Figure  6  and  Table  II 
of  [19]. 

4.1.  Experiment  1 

We  randomly  selected  54  computer-segmented  ROIs  where  30  patches  were  matched  with  the  radiologist’s 
identification  and  24  were  not.  This  database  was  used  to  train  two  neural  network  systems:  (I)  a  conventional  3- 
layer  neural  network  and  (2)  the  proposed  MCPCNN  training  method  using  the  same  neural  network  learning 
algorithm.  The  structure  of  the  MCPCNN  was  described  earlier.  In  the  study,  we  used  one  fully  connected  path, 
four  SC  paths,  four  NC  paths  covering  2  sectors,  four  NC  paths  covering  3  sectors,  three  NC  paths  covering  4 
sectors,  and  two  NC  paths  covering  5  sectors  in  the  first  step  network  connection  for  the  MCPCNN.  All  paths  in  the 
neural  network  have  their  hidden  layers.  Only  one  hidden  layer  per  path  was  used.  Both  neural  network  systems 
were  trained  by  the  error  backpropagation  algorithm  by  feeding  the  features  from  the  input  layer  and  registering  the 
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corresponding  target  value  at  the  output  node.  Completion  of  the  training  was  determined  by  the  mean  square  error 

N  j 

—  O-Y  N,  where  N  is  number  of  samples)  when  it  was  approximately  reduced  to  3x10*^.  Once  the 

i=i  / 

training  of  the  neural  networks  was  completed,  we  then  used  the  remaining  71  computer  segmented  ROIs  for  the 
testing.  Forty-five  out  of  71  ROIs  were  masses  and  26  ROIs  were  not.  Neither  the  images  nor  their  corresponding 
patients  in  the  testing  set  could  be  found  in  the  training  set.  The  neural  network  output  values  were  fed  into  the 
LABROC4  program  [29]  for  the  performance  evaluation.  The  results  indicated  that  the  areas  (Az)  under  the 
receiving  operator  characteristic  (ROC)  curves  were  0.7869±0.0536  and  0.8443±0.0457  using  the  conventional 
neural  network  (MLP)  and  the  MCPCNN,  respectively.  The  ROC  curves  of  these  two  neural  network  systems  are 
shown  in  Figure  6(A).  The  Az  value  was  0.7869  ±0.0536  when  using  the  MLP  method  with  125  hidden  nodes. 
The  performance  of  the  MLP  remains  about  the  same  at  0.7809  ±0.0551  of  Az  using  the  same  neural  network 
parameters  but  with  30  hidden  nodes. 

We  also  invited  another  senior  mammographer  to  conduct  an  observer  study  using  the  ROC  study  protocol. 
The  mammographer  was  asked  to  rate  each  patch  using  a  numerical  scale  ranging  0-10  for  its  likelihood  of  being  a 
breast  mass.  The  image  patches  were  displayed  on  a  SUN  monitor  (Model:  GDM-20D10).  The  image  size  shown 
on  the  monitor  was  reduced  to  approximately  7”x9’’  as  compared  to  the  original  film  size  (8”xl0”)-  These  71 
numbers  were  also  fed  into  the  LABROC4  program.  The  Az  of  the  mammographer’ s  performance  on  this  set  of  test 
cases  was  0.909  ±0.0340.  The  corresponding  ROC  curve  is  also  shown  in  Figure  6(A). 

4.2.  Experiment  2 

We  also  conducted  a  leave-one-case-out  experiment  (i.e.,  Jackknife  procedure)  using  the  same  database.  In  this 
experiment,  we  used  those  image  patches  extracted  from  91  mammograms  (one  mammogram  per  91  case)  for  the 
training  and  used  the  image  patches  (most  of  them  are  single)  extracted  from  the  remaining  one  mammogram  as  test 
objects.  The  procedure  was  repeated  91  times  to  allow  every  ROI  extracted  from  each  mammogram  to  be  tested  in 
the  experiment.  For  each  individual  ROI,  the  computed  features  were  identical  to  those  used  in  Experiment  1, 
Again,  the  training  was  stopped  when  the  mean  square  error  value  approximately  equal  to  3x10"^.  Both  neural 
network  systems  were  independently  trained  and  evaluated  with  the  same  procedure.  The  results  indicated  that  the 
Az  values  were  0.7985  ±0.0394  and  0.8866  ±0.0289  using  the  conventional  neural  network  (MLP)  and  the 
MCPCNN,  respectively.  The  performance  of  the  MLP  decreased  to  an  Az  of  0.7608  ±0.0429  using  the  same  neural 
network  parameters  but  with  30  hidden  nodes.  Figure  6(B)  shows  the  ROC  curves  of  these  two  neural  network 
systems  using  the  leave-one-of-out  procedure  [30]  in  the  experiment. 


FPF  FPF 

(A)  (B) 

Figure  6.  The  ROC  curves  obtained  from  corresponding  experiments. 

(A)  The  left  figure  shows  that  the  performance  of  MCPCNN  training  method  is  superior  to  that  of  the  conventional 

MLP  method.  The  highest  curve  is  the  ROC  performance  of  the  senior  mammographer. 

(B)  The  right  figure  shows  that  the  ROC  results  were  increased  using  the  leave-one-case-out  procedure  in  both 
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neural  network  systems.  The  MCPCNN  still  showed  higher  performance  than  conventional  MLP  method. 

We  also  used  CLABROC  program  [31]  to  analyze  the  ROC  data  and  compare  the  ROC  results.  The  results  and 
their  statistical  significances  using  two  tailed  p  value  of  0.05  as  the  threshold  are  shown  in  Table  I.  The  radiologist’s 
performance  is  greater  than  conventional  neural  network  system  with  a  p  value  of  0.0447  in  the  first  experiment. 
The  MCPCNN  was  also  proven  to  be  superior  to  the  MLP  with  a  statistically  significant  result  (p  =  0.0241). 

Table  I.  ROC  Performance  of  the  Test  Methods  in  Distinguishing  True  and  False  Masses 


■l^l 

Comparative  Analyses  oi 
Methods 

Azof 

Method  (2) 

P 

Values 

Statistical 

Significance 

(1)  Radiologist  vs.  (2)  MCPCNN 

0.909  ±  0.0340 

0.8443  ±  0.0457 

2^91 

No 

(1)  Radiologist  vs.  (2)  MLP 

0.909  ±  0.0340 

0.7869  ±  0.0536 

SSSEEHI 

Yes 

(1)  MCPCNN  vs.  (2)  MLP 

0.8443±0.0457 

0.7869  ±  0.0536 

msm 

No 

Experiment  2 

(1)  MCPCNN  vs.  (2)  MLP 

0.8866  ±0.0289 

0.7985  ±  0.0394 

0.0241 

Yes 

5.  Discussion 

It  is  known  in  the  field  of  artificial  intelligence  that  the  key  factors  in  pattern  recognition  are:  (1)  effective 
methods  in  the  extraction  of  features  and  (2)  classification  methods  for  the  extracted  features.  In  this  study,  we 
showed  that  the  training  method  designed  to  guide  the  analyzer  is  also  an  important  factor  for  a  pattern  recognition 
task.  Though  this  finding  is  not  new,  the  research  of  developing  training  methods  for  various  pattern  recognition 
tasks  has  not  been  established  in  the  field  of  medical  imaging.  Our  studies  demonstrated  that  with  proper  network 
connections  and  task-oriented  guidance,  organized  features  would  assist  the  neural  network  in  performing  the  task. 

Technically  speaking,  a  feed-forward  MLP  neural  network  provides  an  integrated  process  for  classification  and 
sometimes  for  feature  extraction.  The  output  values  of  the  hidden  nodes  can  be  interpreted  as  a  reorganized  set  of 
features  presented  to  the  output  layer  for  classification.  The  drawback  of  the  MLP  is,  the  user  has  a  very  little 
control  and  little  understanding  about  the  network  learning.  The  MCPCNN  is  a  network  design  that  partially 
remedies  these  issues  and  is  applicable  for  any  pattern  recognition  task  associated  with  ROIs.  The  MCPCNN  (a 
member  of  the  CNN  family)  possesses  shared  weights  in  the  hidden  layer(s)  that  act  as  filter  kernels  for  extracting 

correlated  features.  With  a  higher  resolution  mammogram,  a  finer  sector  (<10°)  would  be  preferred  for  the  analysis 
mass,  especially  for  the  study  of  classification  of  masses.  During  forward  and  backpropagation  training,  the  kernels 
would  comply  with  both  signals  from  input  and  output  layers  for  all  training  cases,  so  as  to  maximize  the 
classification  performance.  One  reason  that  we  do  not  recommend  using  2DCNN  for  the  detection  of  masses  is  the 
sizes  of  masses  vary.  It  would  require  a  large  fixed  size  to  cover  the  maximum  mass  size  when  using  the  2DCNN. 
The  varieties  of  mass  shapes  and  potential  long  spiculated  patterns  make  the  use  of  the  2DCNN  not  practical.  Since 
the  MCPCNN  processes  the  features  computed  from  sectors,  it  does  not  limit  the  sizes  of  its  ROIs.  Best  of  all,  the 
MCPCNN  also  has  the  ability  to  classify  partially  obscured  masses.  The  2DCNN,  however,  would  be  more 
appropriate  for  the  detection  of  microcalcifications  and  small  lung  nodules. 

As  far  as  the  research  in  the  detection  of  masses  is  concerned,  we  have  shown  that  use  of  MCPCNN  with  sector 
features  is  an  effective  approach.  Since  the  MCPCNN  coordinates  the  input  data  and  performs  correlation  between 
features  of  adjacent  sectors  in  the  first  stage  of  data  processing,  the  internal  neural  network  learning  algorithm  can  be 
changed  if  a  learning  algorithm  is  found  to  be  more  effective.  In  fact,  the  MCPCNN  is  a  technique  that  can 
effectively  classify  features  arranged  in  the  polar  coordinate  system.  A  technique  using  the  rubber  band 
straightening  transformation,  independently  developed  by  Sahnier  et  al.  [12],  for  the  detection  of  masses  also 
employs  a  similar  concept  in  extracting  feature  and/or  texture  in  the  polar  coordinate  system.  We  believe  that 
integration  of  features  and  texture  values  computed  at  small  sectors  will  be  the  research  trend  in  mass  detection  and 
tumor  classification. 

6.  Conclusions 

In  the  clinical  course  of  detecting  masses,  mammographers  usually  evaluate  the  surrounding  background  of 
a  radiodense  area  when  an  ROI  is  suspected.  In  this  study,  we  simulated  this  fundamental  concept  with  a  neural 
network  system  (i.e.,  MCPCNN).  In  order  for  the  MCPCNN  to  function,  boundary  features  of  the  suspicious  region 
in  each  radial  sector  were  computed.  We  found  that  the  MCPCNN  is  capable  of  analyzing  correlated  features  within 
the  sector  and  between  adjacent  sectors,  which  led  to  an  improvement  in  detecting  mammographic  masses. 
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Through  this  study,  we  found  that  the  selected  features  are  somewhat  effective  in  the  detection  of  masses. 
These  features  were  “computationally  translated”  from  the  qualitative  descriptors  of  BI-RAD.  These  features  can  be 
extended  for  the  improvement  of  the  mass  detection,  but  this  task  is  beyond  the  scope  of  this  paper.  With  the 
preliminary  studies  shown  above,  we  found  the  MCPCNN  coupling  with  the  proposed  training  method  produced 
greater  results  than  the  conventional  neural  network.  We  found  that  the  performances  of  both  neural  network 
systems  were  improved  in  Experiment  2.  This  may  have  occurred  due  to  the  number  of  training  samples  that  was 
increased  from  54  to  124.  In  Experiment  2,  the  Az  value  was  improved  by  0.043  using  the  MCPCNN,  which  was 
higher  than  the  Az  difference  of  0.012  obtained  by  the  conventional  training  method.  The  results  implied  that  the 
MCPCNN  learned  more  effectively  than  the  conventional  neural  network  when  the  number  of  training  cases  was 
increased.  With  the  use  of  a  larger  database  and  advanced  texture  features  proposed  by  others,  it  is  expected  that  the 
performance  of  MCPCNN  should  be  significantly  improved.  This  paper  does  not  intend  to  claim  the  best  mass 
detection  system,  in  comparison  to  similar  systems;  but  rather  its  goal  is  to  report  a  potentially  better  neural  network 
structure  for  analyzing  a  set  of  mass  features. 
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Abstract  A  neural  network  based  framework  has  been  developed  to  search  for  an  optimal  wavelet  kernel  that  is  tailored  for  a 
specific  image  processing  task.  In  this  paper,  a  linear  convolution  neural  network  was  used  to  seek  a  wavelet  that  minimizes 
errors  and  maximizes  compression  efficiency  for  an  image  or  a  defined  image  pattern  such  as  microcalcifications  on 
mammograms.  We  have  used  this  method  to  evaluate  the  performance  of  tap-4  wavelets  on  mammograms,  CTs,  MRIs,  and 
Lena  images.  We  found  that  Daubechies  wavelet  or  those  wavelets  possessing  similar  filtering  characteristics  produces  the 
highest  compression  efficiency  with  the  smallest  mean- square-error.  However,  Haar  wavelet  produces  the  best  results  on 
sharp  edges  and  low-noise  smooth  areas.  We  also  found  that  a  special  wavelet,  whose  low-pass  filter  coefficients  are 
(0.32252136,  0.85258927,  0.38458542,  -0.14548269),  can  greatly  preserve  the  microcalcification  features  in  peak  signal-to- 
noise  ratio,  contrast,  and  figure  of  merit  during  a  course  of  compression.  Explanations  of  the  experimental  results  are 
provided  by  reviewing  the  spectrum  of  wavelet  filters.  This  newly  developed  optimization  method  can  be  generalized  to  other 
image  analysis  applications  where  a  wavelet  decomposition  is  employed. 

Keywords:  Optimization  of  wavelet,  neural  network,  wavelet  decomposition, 

image  feature  restoration,  and  image  compression. 


L  Introduction 

In  the  field  of  transform  coding,  discrete  cosine  transform  (DCT)  based  decomposition  methods  were  developed 
extensively  in  the  1970’s  and  1980’s.  Most  of  these  techniques  are  associated  with  block  DCT  [l]-[4].  However,  several 
investigators  have  indicated  that  the  use  of  full-frame  DCT  [5]-[7]  can  produce  high  compression  efficiency  with  high  data 
fidelity  and  without  blocky  artifact.  This  method  is  particularly  appropriate  for  high-resolution  large-sized  images.  Recently, 
sub-band  and  wavelet  transformations  have  been  widely  used  in  image  compression  research  [8]-[10].  Unlike  DCT,  wavelet 
transform  coefficients  are  partially  localized  in  both  spatial  and  frequency  domains,  and  form  a  multiscale  representation  of 
the  image.  In  addition,  wavelet  transform  coefficients  possess  orientation  specificity.  Since  wavelet  transform  has  these 
attractive  features,  many  efforts  have  been  made  to  effectively  encode  transformed  coefficients  [1I]-[13].  As  a  result,  it  has 
been  shown  that  wavelet  transform,  in  many  situations,  obtained  significantly  greater  results  than  those  obtained  from  block 
DCT  techniques  in  a  course  of  image  compression. 

The  wavelet  decomposition  is  determined  by  one  mother  wavelet  function  and  its  dilation  and  shift  versions.  Since  the 
mother  wavelet  functions  are  not  unique,  different  wavelet  bases  can  produce  different  wavelet  coefficients.  Investigators 
often  have  a  difficulty  in  selecting  an  optimal  wavelet  for  a  specific  image  procession  application.  Many  technical  issues 
relating  to  this  area  remain  unsolved.  The  choice  of  optimal  wavelet  depends  on  different  criteria  in  various  applications.  The 
early  work  on  selecting  wavelet  basis  can  be  found  in  [14]-[16].  In  [16],  Tewfik  et.  al.  proposed  a  method  for  selecting  a 

wavelet  for  signal  representation  based  on  minimizing  an  upper  bound  of  the  norm  of  error  in  approximating  the  signal  up 
to  a  desired  scale.  In  [14],  Coifman  and  Wickerhauser  developed  an  entropy-based  algorithm  for  choosing  the  best  wavelet 
basis  that  can  achieve  higher  compression  ratios  with  a  generalization  of  wavelet  packets,  at  the  expense  of  increased 
processing  time.  Villasenor,  et  al.  derived  a  wavelet  filter  evaluation  metric  according  to  the  filter  impulse  response  and  step 
response  in  addition  to  regularity.  Based  on  this  metric,  some  of  the  best  filters  suitable  for  image  compression  were  selected 
from  a  biorthogonal  wavelet  filter  bank  [15]. 
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In  the  field  of  image  compression,  one  major  criticism  in  the  evaluation  of  reconstructed  images  is  that  investigators 
usually  present  a  global  measure  of  a  large  image  rather  than  a  local  quantification  of  a  specific  image  pattern.  The  former 
only  provides  overall  performance  of  a  compression  technique  on  an  image.  In  many  applications,  particularly  in  medical 
imaging,  a  local  image  pattern  can  be  of  major  concern.  In  such  a  case,  the  performance  at  the  region  of  interest  (ROI)  should 
be  weighted  much  higher  than  that  of  other  places.  Our  research  goal  is  to  investigate  which  wavelet  filter  performs  the  best 
compression  result  for  a  given  image  pattern.  In  our  experiment,  we  isolated  various  types  of  ROIs  on  several  medical 
imaging  modalities  for  the  evaluation  of  data  fidelity  and  compression  ratio  using  wavelet  decomposition  techniques. 

On  the  selection  of  an  optimal  wavelet,  we  propose  to  use  a  linear  convolution  neural  network  system  that  possesses  an 
embedded  wavelet  operation.  Through  a  controlled  backpropagation  algorithm,  the  neural  network  is  capable  of  searching  for 
an  optimal  wavelet  that  minimizes  quantization  errors  and  at  the  same  time  produces  the  highest  compression  efficiency.  This 
newly  developed  convolution  neural  network  method  can  also  be  extended  to  evaluate  various  wavelets  in  preserving  defined 
image  features. 

The  rest  of  this  paper  is  organized  as  follows.  In  Section  II,  discrete  wavelet  transform  is  briefly  reviewed  and  a  wavelet 
based  convolution  neural  network  is  described.  In  addition,  migration  from  a  wavelet  kernel  to  another,  which  is  embedded  in 
the  searching  method  of  the  neural  network  system,  is  also  presented.  Section  III  describes  the  experimental  method  used  to 
evaluate  the  proposed  approach.  The  results  are  given  in  Section  IV.  Section  V  discusses  the  results  of  wavelet  search  with 
Ae  image  patterns  and  characteristics  of  optimized  wavelets.  Section  VI  summarizes  the  technical  achievements  and  their 
implications  in  the  field  of  medical  image  compression. 


II.  Algorithm  Development 

II.A.  Two-Dimensional  Wavelet  Decomposition 

Following  Mallafs  2-D  wavelet  analysis  [9],  the  two-dimensional  scaling  function  is  composed  of  two  one-dimensional 
scaling  functions  in  both  directions  if  they  are  separable: 

<p{x)(l>{y)  (1) 

where  is  a  scaling  function.  The  associated  two-dimensional  wavelets  are  defined  as 


W  ix,y)=  <l>{x)liA,y) 
y/^  {x,y)  =  llKx)(l>{y) 


...(2) 

-(3) 

yr'^{x,y)=  iiA,x)itr{y)  (4) 

where  y/{x)  is  the  1-D  wavelet  corresponding  to  the  1-D  scaling  function.  Using  the  sub-band  coding  algorithm,  the 
wavelet  transform  (2-D  DWT)  of  a  matrix  has  four  parts: 

^Ldf{x,y))  =  S[(/U,>')/i(M-2jc,0))/t(0,v-2y)]=  Y.[f{x,y)hi^i{u-2x,v-2y)] 


M,V 


^LH(fix,y))  =  1  [if(x,y)h(u  -  2x,0))g(0,v  -  2y)]  =  l[f(x,y)hiHiu-  2x,v  -  2y)\ 

u,v  u,V 

^HL^f  ^x,yy)  —  ^\ifix,y)g{u  —  2x,0y)h{0,v  —  2y)]=  {,x,y)hfjjJju  —  2x,v  —  2y)J 
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...(5) 

...(6) 

-(7) 

...(8) 
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where  h  and  g  functions  are  the  low-pass  and  high-pass  filters  of  the  sub-band  decomposition  with  condition 

g(u)=i-lfh(l-u).  ...(9) 

The  low-pass  filter,  h,  also  must  satisfy  two  criteria  to  construct  the  orthogonal  basis  of  compactly  supported  wavelets  [8],  [9]. 
For  simplicity,  we  also  use  g^  and  h[f  to  replace  g(u)  and  h(u),  respectively,  in  this  paper. 


(a) 


hu 


■V2/2  = 


'2u+l 


-V2/2  =  0; 


(b)  should  be  orthogonal;  this  means  that 


'Lk  xi>, 


'u+2n 


L  w 


^u.u*2n 


...(10) 
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where  Si  j  is  Dirac  delta  function  and  n  is  an  integer.  However,  high  degree  of  regularity  and  high  degree  of  vanish  moments 
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were  not  imposed  in  this  study.  Because  these  are  very  strong  constraints  from  the  compression  point  of  view.  Typically, 
those  filters  performing  perfect  reconstruction  are  illegible  in  a  data  compression  scheme.  With  a  single  low-pass  filter 
system,  the  criterion  of  perfect  reconstruction  is  the  same  as  that  in  (1 1).  Because  the  synthesis  filter  is  the  reflection  function 
of  the  analysis  filter.  For  simplicity,  we  will  use  a  single  low-pass  filter  to  perform  wavelet  transform  in  the  following 
algorithm  development. 

The  2-D  filters  at  the  second  forms  of  (5)-(8)  are  the  vector  products  of  h  and/or  g  filters.  The  relationship  between  high- 
pass  and  low-pass  filters  makes  the  unification  of  the  four  sets  of  decomposition  possible  as  shown  in  Section  II.  D. 
According  to  wavelet  theory,  it  is  known  that  given  a  set  of  h,  one  can  calculate  the  Fourier  transform  of  the  scaling  and 
wavelet  functions  as  follows: 

OCw)  =  /  2)  ...(12) 

=  ...(13) 

where  Hq  and  H ]  are  Fourier  transforms  of  h  and  g  filters,  respectively.  Hence,  both  the  scaling  and  wavelet  functions  can  be 
obtained  through  infinite  recursion  by  using  (12)  and  (13),  respectively. 

ILB  .Construction  of  a  Neural  Network  using  Wavelet  Decomposition 

The  artificial  neural  network  described  in  this  paper  is  based  on  the  linear  convolution  process  which  is  used  in  sub-band 
and  wavelet  decomposition  techniques.  Each  wavelet  processing  in  the  neural  network  performs  exactly  the  same  as  the 
conventional  wavelet  transform  given  in  (5)-(8).  Our  approach  is  to  use  the  searching  capability  of  the  neural  network  to 
obtain  the  most  suitable  wavelet  kernel  through  an  image  compression  scheme  [17].  In  this  paper,  one  major  research  task  is 
to  minimize  error  and  simultaneously  achieve  the  highest  compression  efficiency  during  the  course  of  compression  and 
decompression  processes.  In  order  to  match  the  sub-band  decomposition,  several  characteristics  of  the  neural  network  must 
be  established:  (a)  no  hidden  but  one  output  layer  is  used,  (b)  local  connection  through  convolution  process  rather  than  fully 
connected  nets  is  employed,  and  (c)  the  convolution  process  must  be  reversible  (wavelet  kernels  are  used  in  this  paper).  In 
order  to  study  the  data  fidelity,  we  add  a  quantization  in  the  compression  process.  Therefore,  the  image  cannot  be  fully 
reconstructed  after  the  decompression  process.  The  differences  between  original  and  reconstructed  images  are  not  due  to  the 
inverse  transformation  but  because  of  Ae  inaccuracy  of  the  quantized  transform  coefficients.  Figure  1  shows  the  structure  of 
the  neural  network  using  quantized  transform  coefficients  as  the  targets. 

Minimization  of  quantization  errors  was  not  the  only  issue  in  our  technical  consideration.  The  method  to  minimize  the 
entropy  must  also  be  taken  into  account  for  the  optimization  in  a  course  of  data  compression.  We  combined  both  issues  by 
multiplying  the  mean-square-error  function  with  an  imposed  entropy  reduction  function.  The  cost  (error)  function  for 
searching  the  optimal  wavelet  kernel  in  the  neural  network  becomes 

Ef(i,j)=  Z(Qni,j))x[Tii,j)-QT{i,j)f/2  ...(14) 

where  QT(i,j)  is  the  quantized  transform  coefficient  at  pixel  (ij)  and  Z(QT(iJ)),  which  is  the  entropy  reduction  function  for  a 
set  of  quantization  coefficients,  is  given  below: 


'  0 

for 

Qni,j)^0 

z{Qni,j))=^ 

1 

for 

\Qni,j)\=l 

...(15) 

Fin,q) 

for 

\Qni,j)\=n. 

F(n,q),  which  is  a  ramp  function,  is  a  function  of  quantization  factor,  q,  and  is  somewhat  inversely  proportional  to  the 
quantized  integer  n.  The  value  of  the  ramp  function  should  always  be  smaller  than  1. 
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Wavelet  Quantization  errors  to  train 

kernel  the  neural  network 


Figure  1.  A  wavelet-based  neural  network  system  that  was  used  to  seek  an  optimal  wavelet  for  minimization  of  the 
quantization  errors.  T(iJ)  and  QT(i,j)  denote  transform  and  quantized  coefficients  in  the  high-frequency  domain,  respectively. 

The  reason  to  design  the  entropy  reduction  function  for  a  fixed  quantizer,  q,  using  (15)  is  three-fold:  (a)  since  most  low 
value  coefficients  (-0.5q  <  T(i„j)  <  0.5q)  are  associated  with  noise  when  q  is  not  a  very  large  value,  there  is  no  need  to 
backpropagate  errors  from  the  output  node  possessing  quantized  value  0  in  the  neural  net;  (b)  the  more  the  small  quantized 
values  there  are,  the  lower  the  assemble  entropy;  and  (c)  the  probability  to  turn  a  high  quantized  value  into  a  quantized  value 
is  very  low,  therefore  errors  backpropagated  from  high  quantized  value  should  be  less  emphasized  as  compared  to  low 
quantized  value  (e.g.,  1,  2,  or  so).  When  q  is  very  small,  the  quantization  error  is  in  the  range  of  global  image  noise.  In  this 
case,  the  neural  network  would  rely  on  the  guidance  of  Z  function  to  search  for  a  wavelet  filter  that  produces  more  low 
transform  values.  The  success  of  this  cost  (i.e.,  error)  function  design  is  depicted  in  our  experiment  shown  in  Section  IV. 

Based  on  Ae  neural  network  shown  in  Figure  1,  we  can  seek  an  optimal  convolution  kernel.  The  specific  searching 
algorithm  is  given  in  Section  II. C.  Section  II. D  shows  a  method  to  conduct  orthogonal  wavelet  decomposition  without  using 
the  high-pass  filter.  Hence,  the  low-pass  filter  is  the  only  kernel  to  process  the  signals  through  4  channels  using  two- 
dimensional  (2-D)  wavelet  decomposition.  In  practice,  the  kernel  directly  suggested  by  the  neural  network  in  each  epoch  may 
not  be  a  wavelet  kernel.  Section  II.E  provides  algorithms  that  modify  the  kernel  to  fulfill  the  requirements  of  wavelet  kernel. 
Through  this  process,  we  can  find  a  wavelet  that  produces  the  lowest  quantization  errors  with  the  lowest  entropy  of  the 
quantized  transform  coefficients. 


lie.  Signal  Propagation  throueh  Convolution  Process  and  Searching  Methods  in  the  Neural  Network 

The  signal  propagation  from  input  layer  to  output  layer  involving  convolution  computation  is  given  below  [18]: 

(*>  j)  -  SQj)  ...(16) 

where  ®  represents  convolution,  S(i,j)  is  the  original  image,  subscript  c  denotes  the  channel  number,  and  Kc(iJ)  is  the 
convolution  kernel  for  channel  c.  For  the  wavelet  decomposition,  the  relationship  between  Kc(i,j)  and  the  wavelet  filters  (i.e., 
h  and  g  filters)  will  be  given  in  Sections  Il.C.  and  II.D. 

Since  we  treat  the  wavelet  transform  as  a  locally  connected  neural  network,  the  well-known  backpropagation  (BP) 
method  can  be  used  as  a  searching  process  by  altering  the  2-D  convolution  kernels  in  each  epoch  [19].  For  each  composed 
signal,  a  linear  function  instead  of  a  typical  sigmoid  function  in  the  neural  network  system  is  used  in  this  process.  The 
updated  kernel  suggested  by  backpropagation  in  the  neural  network  is  given  by 


Kc{u,v\t -b  1]  =  Kf.{u,  v)[f]  +  rf^fj.{i,j)s{u  -i,v-  j)+  atsKc{u,v\t]  ...(17) 

where  t  is  the  iteration  number  during  the  searching,  a  is  the  gain  for  the  momentum  term  received  in  the  previous  learning 
loop,  T]  is  the  gain  for  the  current  weight  changes,  and  //  is  the  weight-update  function  which  is  given  by 

dKc{u,v)  ■ 

Since  a  2-D  DWT  is  composed  of  a  4-channel  wavelet  decomposition,  there  are  4  associated  convolution  kernels  to  be 
updated  simultaneously.  The  association  between  low-pass  and  high-pass  filters,  as  shown  in  (9),  is  a  necessary  constraint  of 
compact  support  in  orthogonal  wavelets.  In  order  to  preserve  this  property,  we  rearrange  the  decomposition  method;  hence, 
only  a  single  kernel  is  needed  to  perform  the  2-D  DWT  as  demonstrated  in  the  next  subsection. 
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II.D.  Unification  of  the  Four  Channels  Decomposition  in  2-D  DWT 

Using  (6)  as  an  example  to  rewrite  the  decomposition  equation  by  replacing  the  g  with  the  h  filter,  we  have: 

WiH(fix,y))  =  S  lf{x,y)h(u-2x,0))i-lfh(0,2y+ 1- v)]  ...(19) 

U,V 

or 

=  S  ^(i-Vi^f(x,-y))hiu-2x,0))h(0,v-2y)^ 

UyV 

r  ,  ...(20) 

=  mii-iy fix-y))hii{u-2x,v-2y)Y  ^{fLH^x,y)hLL{u-2x,v-2y)\ 

M,V 

Converting  (7)  and  (8)  to  use  the  2-D  low-pass  filter  as  the  kernel  is  a  matter  of  changing  the  orientation  from  y-  to  x- 
direction.  These  conversions  also  indicate  that  one  can  use  a  single  2-D  filter  to  compute  the  four  quadrants  of  the  2-D 
wavelet  transform  by  flipping  the  matrix  position  in  x-  and/or  y-directions  and  alternating  the  sign  of  the  flipped  matrix 
corresponding  to  the  directions. 

The  alternated  sign  of  the  source  vector  makes  the  convolution  operation  unconventional.  A  precalculation  method,  that 
involves  a  cross  product  of  two  vectors,  can  be  employed:  flipping  the  data  sequence  of  an  image  is  the  first  vector  and  the 
second  vector  is  fixed  and  composed  of  +1  and  -1.  An  example  of  1-D  precalculation  steps  for  tap-6  kernel  prior  to  the 
convolution  operation  is  given  below: 


Original  data  sequence: 
Flipped  data  sequence: 
Resultant  data  sequence: 


aj  ,a2  ,a3  ,a4  ,a5  ,ag 

’^2  ’^1 


a^  7  a^ ,  -  a^ ,  a- 


2’ 


In  the  case  of  2-D,  three  matrices  associated  with  horizontal,  vertical,  and  diagonal  decomposition  for  the  second  matrix  in 
precalculation  are  shown  in  Figure  2,  With  this  precalculation  (or  cross  product  of  two  matrices),  only  the  low-pass  filter 
huhv  i^u  1-1^)  is  needed  for  the  final  wavelet  transform  operation. 
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Figure  2.  Three  matrices  used  for  the  cross  product  precalculation. 
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Nevertheless  the  resultant  matrix  of  this  precalculation  or  the  cross  product  of  two  matrices  must  be  held  in  the  computer 
memory  to  facilitate  the  computation  for  forward  convolution  and  the  corresponding  backpropagation.  After  precalculation, 
the  size  of  the  intermediate  images  is  (Jcl2X  k/1)  times  the  original  image  size.  The  factor  of  1/2 X  1/2  is  due  to  the  1/2  down 
sampling  two-dimensionally  in  a  conventional  forward  wavelet  transform.  The  largest  three  blocks  shown  in  Figure  3  are  the 
intermediate  images  So(xk/2,  yk/2). 

The  perfect  reconstruction  criterion  of  the  new  filter  may  not  be  self- sustained  with  each  updated  version.  However, 
some  small  modification  is  possible  to  make  the  final  version  of  hn,  if  the  conditions  of  being  a  wavelet  filter  set  are  to  be 
fully  met.  Based  on  each  precalculated  image  So(xk/2,yk/2)  described  earlier,  (17)  can  be  rewritten  for  updating  2-D  wavelet 
kernel 

Kiu,v)[t  +  1]=  Kiu,v)[t]+  Tiy;^juii,j)Soiu-  xk/2,v- yk/2)  +  aAK(u,v)[t]  ...(21) 

ij 

where  index  i  =  corresponds  to  the  sub-image  of  Sq  matched  to  the  kernel  size.  (21)  represents  the  updated 
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kernel  suggested  by  the  backprogapation,  these  values  require  a  conversion  to  a  new  wavelet  kernel  Assuming  the 

wavelet  filter  is  a  2-D  vector  (i.e.,  huhy—  /ly/ia  =  where  u&v  =  0,1,2,  ...  k-l),  then  only  k  free  parameters  ought  to  be 
updated  for  each  wavelet  process.  A  solution  to  satisfy  the  wavelet  constraints  and  to  make  /lyi’v  approximately  equal  to 
K’(u,v}  is  given  in  Section  lI.E. 
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Figure  3.  A  proposed  wavelet  searching  scheme  based  on  a  grouped  kernel  backpropagation  neural  network  to  obtain  an 
optimal  kernel  for  image  compression. 


H.E.  Convening  the  Kernel  Sueeested  by  the  Neural  Network  to  Fulfill  Requirements  of  a  Wavelet  Filter 

As  indicated  in  (21),  the  updated  weights,  K(u,v){t+]]  or  K’(u,v)  of  the  kernel  suggested  by  the  BP  at  t+I  searching 
iteration  are  independent.  One  must  realize  that  each  epoch  in  the  neural  network  searching  is  only  a  suggestion  or 
approximation  that  the  changes  of  weights  may  produce  a  lower  value  for  the  defined  error  function,  Ef.  To  properly  use  this 
suggestion  for  making  a  new  wavelet  kernel,  let’s  assume  that  there  exists  a  set  of  h’,,  so  that  the  updated  2-D  version  of  the 
wavelet  filter  is  very  close  to  K’(u,v).  A  function  based  on  the  square  difference  is  used  in  the  derivation 
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nh'j=lih'^h'^-Kiu,v))^.  ...(22) 

w,v 

Here  we  intend  to  minimize  the  function,  /  subject  to  the  constraint  equations.  Lagrangian  multiplier  method  can  be 
employed  to  solve  this  problem  by  combining /and  constraint  equations: 

df{h^^)  +  2^pdCp(K^)  =  0  ...(23) 

P 

where  d  represents  the  differentiation  operation  of  a  function  and  is  the  Lagrangian  multiplier  for  the  corresponding 
constraint  equation,  (/i' )  =  0  ,  referred  to  (10)  and  (11).  Using  this  approach  we  can  obtain  a  set  of  while /is  also 
minimized. 

III.  Materials  and  Experimental  Methods 

A  database  consisting  of  45  mammograms  was  used  to  conduct  the  study.  Of  these  mammograms,  38  contain  biopsy 
proven  clustered  microcalcifications.  A  total  of  220  microcalcifications  were  embedded  in  41  clusters.  All  45  mammograms 
were  digitized  by  a  LumyScan  (model  150)  film  digitizer  (Lumysis  Sunnyvale,  CA)  with  spot  size  of  0.1  mm.  Each  patch  of 
32 X  32  pixels  (i.e.,  an  area  of  3.2  mmX  3.2  mm)  with  its  center  at  the  peak  value  was  isolated  for  the  study  of  quantization 
impact  on  microcalcifications.  Note  that  typical  sizes  of  microcalcifications  range  from  0.2  mm  to  1.0  mm.  It  is  important  to 
isolate  a  specific  image  pattern,  otherwise  the  neural  network  searching  would  be  out  of  focus  and  could  lead  to  a  failure 
study.  The  processes  of  searching  optimal  wavelet  kernels  for  original  mammograms  and  microcalcification  patches  were 
conducted  as  separate  studies.  Each  image  was  decomposed  by  3-level  wavelet  transform.  Quantization  values  were  q,  qil, 
and  qIA  for  decomposition  of  high  frequency  coefficients  on  levels  1,  2,  and  3,  respectively.  For  each  searching  epoch,  the 
mean-square-error  (MSE)  between  the  original  and  decompressed  images  as  well  as  %zeros  (i.e.,  number  of  zeros  /  total 
number  of  pixels)  were  computed.  Since  %zeros  generally  contributes  the  most  important  factor  to  gain  a  high  compression, 
it  was  used  as  a  coarse  index  for  the  evaluation  of  compression  efficiency  for  each  epoch. 

In  order  to  demonstrate  each  wavelet  performance,  we  sorted  the  first  coefficient  hg  of  the  low-pass  filter  associated  with 
the  mother  scale  function  as  the  horizontal  scale  because  the  searching  epoch  could  not  represent  the  wavelet  being  used  as 
shown  in  Figure  6.  All  hg  values  were  greater  than  -0.1464466094  and  smaller  than  0.85255533905.  The  corresponding  hj 
values  are  greater  than  0.35355339  and  smaller  than  0.85255533905.  Those  hj  values,  which  are  greater  than  -0.1464466094 
and  smaller  than  0.35355339,  have  corresponding  conjugate  values  in  the  former  set  and  can  be  ignored. 

We  have  also  performed  the  same  study  using  the  isolated  220  microcalcification  patches.  The  2-D  profiles  of 
microcalcificiations  and  their  nearby  areas  (i.e.,  the  areas  that  are  not  included  in  the  microcalcification  profile  but  within  the 
isolated  patch  32 X  32  pixels)  were  evaluated  separately  during  this  course  of  the  neural  network  search.  In  addition,  features 
of  the  microcalcifications  were  computed  to  evaluate  their  changes.  These  features  of  microcalcifications  are: 

(a)  the  peak  value,  P; 

(b)  the  contrast,  C  =  P-m^ ; 

where  is  the  average  background  value  which  is  the  immediate  boundary  of  the  microcalcification  profile; 

(c)  the  signal-to-noise-ratio,  PSNR  =  C/fT^ ; 

where  stands  for  the  standard  deviation  of  the  background;  and 

(d)  the  area.  A,  occupied  by  the  2-D  microcalcification  profile. 

IV.  Results 

In  the  neural  network  searching  process,  the  MSE  was  not  the  only  factor  to  be  concerned;  the  entropy  reduction  function 
was  another  factor  that  drove  the  neural  network  to  perform  a  search.  In  the  first  neural  network  experiment,  we  found  that 
the  MSE  changes  very  little  with  a  low  quantization  factor  (^=16).  The  neural  network  process  in  searching  for  the  next 
wavelet  kernel  was  random  and  no  minimum  of  MSE  could  be  found  in  the  mammogram  study.  However,  the  %zeros 
changed  which  led  the  neural  network  to  converge  at  the  maximum  value  of  %zeros.  In  the  microcalcification  study,  we 
found  that  %zeros  does  not  change  much  until  hg  >  0.6.  Figure  4  shows  examples  of  selected  cluster  microcalcifications. 
Figure  5  shows  the  original  learning  steps.  The  figure  indicates  that  MSEs  moved  toward  smaller  values  using  the  proposed 
neural  network  searching  mechanism.  The  hg  values  of  searching  epochs  in  Figure  5  were  sorted  in  ascending  order,  and  the 
MSEs  as  well  as  %zeros  are  replotted  in  Figure  6  which  shows  that  Daubechies  wavelet  performs  the  lowest  MSE,  More 
specifically,  microcalcification  profiles  suffered  higher  MSEs  than  their  background  areas  as  indicated  in  Figure  7. 


Figure  4.  Samples  of  clustered 


microcalcification  extracted  from  the  mammograms. 


Training  Epoch 

Figure  5.  MSEs  were  decreased  during  the  neural 

network  search  on  220  microcalcifications  (q=64). 


>0.2  0.0  0.2  0.4  0.6  0.8  1.0 


Sorted  hO  Values  of  Various  Wavelet  Scale  Functions 
Figure  6.  Decomposition  performance  of  wavelets  on 
220  microcalcifications  (q=64). 


Sorted  hO  Values  of  Various  Wavelet  Scale  Functions 
Figure  7.  Decomposition  performance  of  wavelets  on  220 

microcalcification  profiles  and  background  {q-64). 


Sorted  hO  Values  of  Various  Wavelet  Scale  Functions 


Figure  8.  Decomposition  performance  of  wavelets  on  220 
microcalcification  profiles  and  background  (8-bit,  ^16). 


These  results  were  altered  when  a  very  large  quantization  factor  was  used.  In  Figure  8,  all  the  digital  values 
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microcalcification  patches  were  rounded-off  to  8-bit  prior  to  the  study  that  assumed  digitized  mammograms  containing  about 
4-bit  of  noise  [20].  Although  the  largest  quantization  factor  was  16  for  8-bit  mammograms,  the  effective  quantization  factor 
was  equivalent  to  =s256  in  12-bit  mammograms.  Figure  8  shows  that  Haar  wavelet  (hg  =  0.0)  performs  the  highest  and  the 
lowest  MSEs  for  2-D  microcalcification  profiles  and  their  backgrounds,  respectively.  However,  Daubechies  wavelet  performs 
in  an  opposite  way.  This  is  probably  because  Haar  wavelet  can  produce  a  lower  entropy  in  low-noise  smooth  areas. 

The  results  of  the  microcalcification  evaluation  study  based  on  quantized  wavelet  coefficients  are  shown  in  Figures  9-12. 
In  fact,  the  evaluation  was  performed  with  an  identical  experimental  condition  as  that  in  Figure  8.  However, 
microcalcification  features  were  measured  instead  of  MSEs  and  %zeros.  Note  that  the  percent  numbers  decreases  in  peak 
values,  contrast,  and  SNR  were  shown  in  negative  values.  In  other  words,  the  lower  the  percent  number  decrease  value  is,  the 
more  microcalcifications  involving  negative  changes.  The  figure  of  merit  (FOM)  for  each  measure  was  a  composed  value 
given  by 

FOM  =  (  %  No.  decrease  X  %  decrease  -i-  %No.  increase  X  %  increase)  X  100.  ...(23) 


-0.2  0.0  0.2  0.4  0.6  0.8  1.0  -0.2  0.0  0.2  0.4  0.6  0.8  1.0 

Sorted  hO  Values  of  Various  Wavelet  Scale  Functions  Sorted  hO  Values  of  Various  Wavelet  Scale  Functions 


Figure  9.  Peak  value  changes  due  to  quantization  effects  Figure  10.  Contrast  changes  due  to  quantization  effects 
on  wavelet  domain  for  microcalcifications.  on  wavelet  domain  for  microcalcifications. 


Sorted  hO  Values  of  Various  Wavelet  Scale  Functions  Sorted  hO  Values  of  Various  Wavelet  Scale  Functions 

Figure  11.  PSNR  changes  due  to  quantization  effects  on  Figure  12.  Percent  changes  in  microcalcification  profiles  due 
wavelet  domain  of  microcalcifications.  to  quantization  effects  on  wavelet  domain. 
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As  indicated  in  Figure  9,  the  peak  values  had  a  very  little  changes.  However,  the  percent  in  number  increasing  in  peak 
values,  contrast  values,  and  SNRs  of  microcalcifications  had  approximately  the  same  distribution  in  Figures  9,  10,  and  11. 
The  highest  FOMs  in  all  three  measures  occurred  at  the  wavelet  with  the  low-pass  filter  coefficients:  (0.32252136, 
0.85258927,  0.38458542,  -0.14548269)  which  is  marked  with  an  arrow  sign  on  the  sort  ho  axis  of  the  Figures.  We  call  this 
wavelet  a  microcalcification  friendly  wavelet  or  pCaF  wavelet  for  short.  Figure  12  shows  minor  percent  area  changes  of 
microcalcification  profiles  from  0.2  to  0.6  of  hQ  values.  These  effects  were  not  observed  when  a  low  quantization  factor  was 
used. 

We  also  test  the  algorithm  on  other  images.  Figures  13  and  14  show  the  curves  of  MSEs  and  %zeros  against  the 
sorted  hg  values  the  Lena  image  and  mammograms,  respectively.  In  both  figures,  Daubechies  (Iiq  =  0.48296291)  and  its 
nearby  wavelets  produce  the  highest  %zeros  implying  the  largest  compression  ratio  for  mammograms  and  the  lowest  MSE  for 
the  Lena  and  mammograms.  In  these  studies,  we  also  found  that  when  a  larger  quantization  factor  (g=64)  was  used,  the  MSE 
seemed  to  function  in  the  neural  network  search.  When  a  small  quantization  factor  was  used,  the  quantization  errors  were 
somewhat  random;  hence  the  neural  network  might  not  be  properly  functioned. 


Sorted  hO  Values  of  Various  Wavelet  Scale  Functions 

Figure  13.  Decomposition  performance  of  wavelets 
on  the  Lena  image  (^=64). 


Figure  14.  Decomposition  performance  of  wavelets  on 
mammograms  (9=64). 


V.  Discussion 

Having  observed  the  above  results,  it  is  technically  interesting  to  review  the  spectrum  of  wavelets  mentioned.  The  low- 
pass  h  and  the  high-pass  g  filters  of  the  wavelets  are  shown  in  Figures  15  and  16,  respectively.  Note  that  the  pCaF  wavelet  is 
the  same  wavelet  marked  on  the  horizontal  axis  in  Figures  9,  10,  and  11.  Attention  should  be  paid  to  the  g  filters  since  they 
are  resjxmsible  for  decomposing  high  frequency  coefficients  regardless  quantization.  Essentially,  the  g  filter  performs 
calculation  involving  the  piositive  weight  multiplied  by  the  center  pixel  value  plus  the  adjacent  pixel  values  on  the  two  sides 
multiplied  by  the  negative  weights  of  the  g  filter.  Daubechies  wavelet  has  quite  balanced  negative  terms  at  the  two  sides  of 
the  positive  weight  and  the  sum  of  negative  weights  is  negatively  equal  to  the  positive  weight.  The  latter  is  a  constraint  in  all 
wavelet  filters  anyway.  In  addition,  the  absolute  value  of  g,  (=-h,)  or  g,  (=h,)  should  be  reasonably  large,  which  would 
maintain  the  low-pass  and  the  high-pass  characteristics  for  h  and  g  filters,  respectively.  In  fact,  those  wavelets  near 
Daubechies  wavelet  including  the  pCaF  wavelet  possess  this  property.  From  the  signal  processing  point  of  view,  these 
balanced  weights  in  a  filter  are  very  important  characteristics  to  create  low  entropy  values  for  general  textures.  We  suspect 
that  this  property  may  have  something  to  do  with  so  called  "high  regularity"  in  the  wavelet  theory. 

In  short,  we  found  that  the  main  reason  that  a  wavelet  filter  can  produce  a  low  entropy  for  a  set  of  data  is  because  the 
weight  sum  of  the  g  filter  is  zero.  For  a  general  data  sequence,  the  g  filter  can  perform  even  better  when 

(a)  the  absolute  value  of  g,  (=-h^)  or  g^  ( =h)  is  much  larger  than  that  of  other  weights. 

(b)  the  opposite  signed  weights  are  evenly  distributed  at  the  two  sides  of  g,  or  g^. 
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Figure  15.  Low-pass  filters  of  several  interesting  wavelets.  Figure  16.  High-pass  filters  of  the  same  wavelets. 


For  low-noise  smooth  signals,  Haar  wavelet  may  slightly  outperform  the  others.  For  sharp  edges,  Haar  wavelet  would 
greatly  outperform  the  others,  as  depicted  in  Figure  17  where  only  bones  as  well  as  edges  between  bones  and  soft  tissues 
isolated  on  computed  tomography  (CT)  images  were  the  subjects  for  the  evaluation.  One  of  the  tested  CT  head  images  was 
shown  in  Figure  18. 


Figure  17.  Decomposition  performance  of  wavelets  on  CT  head  bones  and  bone  edges  (^=64). 


Figure  18.  A  CT  head  image 


It  was  interested  in  finding  out  that  the  uCaF  wavelet  with  (/i„,  m=0..3)  =  (0.32252136,  0.85258927,  0.38458542,  - 
0.14548269)  results  in  the  highest  fidelity  of  features.  Figure  9  provides  evidences  showing  MSEs  of  2-D  microcalcification 
profiles  and  background  gradually  merge  from  Haar  to  Daubechies  wavelets.  Since  contrast  and  PSNR  values  are  computed 
using  the  peak  and  background  values  of  the  microcalcifications,  the  optimization  of  these  measures  should  occur  somewhere 
between  Haar  and  Daubechies  wavelets. 

In  the  field  of  compression,  it  is  known  that  the  higher  the  compression  ratio  is,  the  higher  the  error  that  will  be  generated 
in  the  decompressed  image.  However,  through  these  studies  we  discovered  a  new  phenomenon  associated  with  these  two 
main  quantitative  measures  in  compression.  We  found  that  higher  compression  coincided  with  less  error  in  all  the  studies  (see 
Figures  4,  5,  7,  &  16)  using  a  fixed  quantizer.  This  may  be  because  high  compression  is  associated  with  low  entropy,  which 
means  that  the  data  contains  more  low  values  and  less  variation  between  the  originally  transformed  and  quantized  coefficients. 
This  phenomenon  happens  only  when  the  quantization  factor  is  fixed.  We  would  like  to  call  to  the  reader’s  attention  regarding 
the  Hide  between  this  phenomenon  and  the  designed  error  function  that  comprises  MSE  and  entropy  reduction  terms  for 
searching  an  optional  wavelet  in  the  convolution  neural  network.  With  this  concurrent  trend  (i.e.,  less  error  is  associated  with 
low  entropy  using  a  fixed  quantizer),  the  neural  network  seems  to  be  effectively  operated  in  this  searching  task.  Otherwise 
they  would  have  functioned  as  competing  factors  and  would  have  made  the  task  difficult  during  the  neural  network  search. 

Although  we  have  shown  the  general  framework  of  a  wavelet  filter  search  using  a  neural  network  algorithm,  only  tap-4 
wavelets  were  employed  in  our  experiment.  It  seems  that  the  above  findings  can  be  generalized  for  high  order  wavelets 
because  the  g  filter  is  the  key  operator  for  the  wavelet  decomposition.  The  distribution  of  weights  for  high  order  wavelets 
should  be  maintained  as  discussed  above  in  order  to  obtain  a  low  entropy.  We  will  continue  to  investigate  the  performance  of 
dual  low-pass  filter  wavelets  where  both  an  odd  and  even  number  of  weights  are  used.  We  predict  that  high  performance 
wavelets  in  compression  and  data  accuracy  should  possess  balanced  distribution  of  weights  in  the  g  filter  of  wavelets  [10, 15], 

In  our  previous  papers,  we  indicated  that  wavelet  (both  single  and  dual  low-pass  filter  systems)  decomposition  might  be 
appropriate  for  low-resolution  small  images  such  as  the  Lena  image,  CTs  and  MRIs.  For  high-resolution  large  images  such  as 
digitized  chest  radio^aphs  and  mammograms,  we  found  that  the  full-frame  DCT  performed  with  the  highest  compression 
efficiency  [21].  This  is  because  the  DCT  can  pack  highly  correlated  image  information  in  a  small  frequency  area.  The  DWT, 
however,  requires  many  levels  in  decomposition  to  achieve  a  high  compression  ratio.  The  data  inaccuracy  would  propagate 
from  high  level  wavelet  domains  to  low  level  and  to  the  reconstructed  image. 
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VI,  Conclusions 

A  neural  network  based  method  has  been  developed  to  search  for  optimal  wavelet  kernels  which  can  produce  the  most 
favorable  set  of  transform  coefficients  to  preserve  data  accuracy  and/or  defined  image  features  during  the  compression.  In 
this  paper,  our  technical  achievements  are:  (a)  development  of  a  unified  method  to  facilitate  multi-channel  wavelet 
decomposition;  (b)  designing  a  cost  error  function  consisting  of  MSE  and  imposed  entropy  reduction  function  to  seek  an 
optional  wavelet  kernel  in  the  convolution  neural  network;  and  (c)  converting  a  neural  network  suggested  kernel  into  a  filter 
constrained  by  the  wavelet  requirements. 

In  all  medical  image  modalities  we  have  tested  so  far  (including  mammography,  CT,  MRI),  Daubechies  wavelet  or 
its  nearby  wavelets  generally  performs  slightly  greater  compression  results  than  those  of  other  wavelets  based  on  the  measure 
of  mean-square-error.  With  a  large  quantization  factor,  Haar  wavelet  produces  the  lowest  and  highest  MSEs  for  the 
background  and  microcalcification  profile  areas,  respectively.  However,  Daubechies  wavelet  produces  an  opposite  result.  In 
addition,  we  found  that  the  pCaF  wavelet  (i.e.,  0.32252136,  0.85258927,  0.38458542,  -0.14548269),  possesses  the  highest 
feature  preservation  capability  in  microcalcification  peak,  contrast,  and  PSNR.  Through  this  study,  we  also  found  that  Haar 
wavelet  sometimes  produced  a  dramatic  result  for  high  contrast  edges.  In  addition,  optimization  usually  occurs  on  a  band  of 
wavelets  not  at  a  single  wavelet. 

We,  therefore,  conclude  that  Daubechies  wavelet  (and  its  nearby  wavelets)  is  generally  applicable  for  image 
compression.  However,  Haar  wavelet  is  suitable  for  low-noise  smooth  areas  and  sharp  edges.  For  a  specific  image  pattern 
such  as  microcalcifications  on  mammograms,  one  might  find  that  a  wavelet  filter  can  best  preserve  the  features. 

By  reviewing  the  g  filters  of  various  wavelets,  we  found  those  optimal  wavelets  for  general  image  texture  have  some 
things  in  common:  they  possess  balanced  negative  terms  at  the  two  sides  of  the  positive  weight  and  the  absolute  value  of  gj  or 
g^  is  much  larger  than  that  of  the  other  weights. 
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Abstract  A  wavelet  based  two-dimensional  convolution  neural  network  (WBCNN)  has  been  developed  for  image 
pattern  recognition.  The  structure  of  the  convolution  is  based  on  the  neocognitron.  Nets  between  two  adjacent  layers  in 
the  feature  selection  level  of  the  neural  network  are  selectively  interconnected  across  groups.  Each  group  in  the  receiving 
layer  receives  signals  from  a  group  of  weights  (i.e.,  kernels).  For  the  forward  signal  propagation,  the  product  obtained 
from  the  kernel  convoluting  the  front  layer  is  collected  onto  the  corresponding  matrix  element  of  the  receiving  layer. 
Since  isolated  patterns  processed  by  internal  filtering  and  classification  layers  built-in  the  neural  network  structure,  the 
image  patterns  are  expected  to  be  more  recognizable.  The  WBCNN  was  trained  by  a  grouped  process  of 
backpropagation.  In  the  WBCNN,  we  forced  each  updated  convolution  kernel  to  be  orthonormal.  Therefore,  features 
(transformed  coefficients)  selected  on  the  transform  domain  are  linearly  independent.  Hence,  the  fully  connected  layers 
in  the  classification  level  of  the  CNN  can  perform  more  effectively. 

The  applications  of  the  CNN  for  pattern  recognition  have  been  very  successful.  In  this  initial  study,  we  used  only  a 
two-level  structure  and  eliminated  all  complex-cell  layers  to  evaluate  the  effects  of  wavelet  kernel  process.  Although,  we 
only  limit  improvement  on  the  ROC  performance  using  the  WBCNN  in  the  mammographic  microcalcification  studies, 
this  method  can  assist  us  in  the  analysis  of  the  trained  kernels  and  expected  to  lead  to  the  optimization  of  feature 
extraction  in  a  course  of  pattern  recognition. 

Key  words:  Artificial  neural  network,  wavelet  decomposition,  detection  of  microcalcifications,  and  image  pattern 
recognition. 

I.  Introduction 

Currently,  the  scope  of  research  activities  involving  wavelet  transform  [1],  [2]  and  artificial  neural  networks[3],  [4] 
extends  to  a  broad  variety  of  fields.  Successful  applications  have  been  reported  in  many  areas,  however,  not  much  work 
has  been  done  to  adapt  the  strength  of  the  two  techniques  in  advancing  the  technology.  In  this  study,  we  modify  the  two- 
dimensional  convolution  processing  of  a  newly  developed  neural  network  to  adapt  the  kernels  with  wavelet  basis.  The 
reasons  for  using  wavelet  kernels  for  the  convolution  process  in  the  neural  network  are:  (a)  extracted  features  are  linearly 
independent  with  wavelet  decomposition,  (b)  many  choices  of  wavelet  bases  allow  the  optimization  of  the  system,  and  (c) 
the  capability  to  perform  multiresolution  analysis. 

One  of  the  major  criticisms  of  using  a  conventional  backpropagation  neural  network  (BPNN)  for  image  recognition 
is  that  the  neural  nets  are  fully  and  uniformly  connected  from  one  node  of  the  upper  layer  to  all  nodes  in  the  lower  layer. 
However,  there  is  no  guarantee  that  the  adjacent  pixel  information  in  the  image  is  more  weighted  than  non-local  pixel 
information  during  the  training.  On  the  other  hand,  the  convolution  neural  network  (CNN),  which  is  a  simplified  version 
of  vision- type  neural  network  [5],  has  been  inherently  designed  to  perform  local  feature  extraction.  In  this  paper,  we 
chose  the  CNN  as  the  fundamental  architecture  of  signal  propagation  platform  and  intend  to  explore  advanced  image 
pattern  recognition  algorithm.  We  follow  a  two-step  approach  of  computer  search  [6] -[9]:  (a)  preliminary  search  for 
extracting  suspected  disease  areas,  and  (b)  examination  of  the  suspected  areas  for  final  classification.  The  second  step  is 
the  principal  subject  of  this  paper. 

II.  Algorithm  Development 

ILA.  Review  of  Convolution  Neural  Network 

Compare  to  other  artificial  neural  networks,  the  neocognitron  [5]  possesses  a  network  structure  most  similar  to 
human  vision.  The  neocognitron  is  composed  of  an  input  layer  (retina)  and  four  levels  of  grouped  neurons.  Each  level 
consists  of  two  layers.  The  front  layer  of  each  level  is  called  a  simple-cell  layer.  The  second  layer  is  a  complex-cell 
layer.  Each  layer  is  composed  of  many  neurons  which  are  collected  into  several  groups.  Nets  from  simple-cell  layer  to 
complex-cell  layer  do  not  interconnect  between  groups.  Nets  from  complex-cell  layer  to  the  next  simple-cell  layer  are 
selectively  interconnected  between  groups.  All  the  nets  are  organized  with  a  2-dimensional  convolution  kernel.  In  the 


training,  the  weights  are  adjusted  by  relatively  complicated  rules  (functions)  and  inhibitory  and  excitatory  theories. 
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In  the  convolution  neural  network,  weighting  factors  are  shared  and  are  formed  as  a  kernel  for  each  group  in  a 
collection  layer.  The  convolution  is  processed  between  the  weighting  kernel  and  the  front  layer.  This  accounts  for 
major  differences  between  the  vision  type  neural  network  and  the  regular  fully  connected  neural  network. 

Besides  the  nets  between  the  last  hidden  layer  and  output  layer.  Figure  1  shows  that  convolution  processes  are 

operated  from  an  image  block  (e.g.  X^Y  pixels)  with  a  convolution  kernel  (e.g.,  size  of  k^k).  The  resultant  data  can  also 
be  organized  as  2-D  feature  maps.  Depending  upon  the  number  of  independent  kernel  used  (e.g.,  N  kernels),  we  will 
receive  M  groups  of  2-D  feature  maps  in  the  next  hidden  layer.  All  nodes  (e.g..  On)  on  the  output  layer  are  fully 
connected  to  the  last  hidden  layer.  This  neocognitron  variant  can  be  interpreted  as  following;  (a)  the  convolution 
processes  are  designed  to  perform  automatic  feature  extraction  onto  feature  maps  in  the  hidden  layers.  The  fully 
connected  networks,  which  process  final  signals,  are  used  to  make  final  classification  as  a  regular  neural  network. 


Hidden  layer  1 
(N  matrices  of 


Figure  1.  A  simplified  convolution  neural  network. 

Signal  Propagation  and  Training  of  the  CNN 

The  signal  propagation  and  backpropagation  for  fully  connected  networking  follow  standard  BPNN  algorithm  [3]. 
However,  the  signal  propagation  from  input  layer  to  feature  maps  involving  convolution  computation  is  given  below: 

Sx{(iJXn)= - - - - -  (1) 

1  +  expj-^  [a:((m,  v);n, m) ®  S^- 1  ((i,  j); w)]| 

or 

(O’  7  )  =  r - - r  ...(2) 

l  +  exp]- 

I  «,v,m  J 

where  Sx((iJ),'  n)  represents  the  signal  at  node  (i,  j),  nth  group,  and  x  layer.  Kx((u,v);  n)  denotes  a  weighting  factor  value 

at  net  (u,  v)^  nth  group,  and  connecting  from  x-I  to  x  layer.  represents  the  connection  between  groups  m  in  layer 

x-I  and  n  in  layer  x. 

Similar  to  a  fully  connected  networking  in  a  backpropagation  neural  network,  the  iterative  version  of  kernel  weights 
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is: 

((m, v); n)[f  +  ((«, v); nft]+'Tj'Z{S^ {(i, j);n)S^_i {(i -u,j- v); m)}+  cxM:^ ((m, v); n)[?]  ...(3) 

ij 

where  t  is  the  iteration  number  during  the  training,  CX  is  the  gain  for  the  momentum  term  received  in  the  last  learning 
loop,  ^  is  the  gain  for  the  current  weight  changes,  and  5  is  the  weight-update  function  which  is  given  as 

jy,n)  =  ;);«)[!  -  ;);n)  ...(4) 

where  Qx{iijyn)=  I,K^+i{(u,vy,m)xS^+i{ii  +  u,  j +  vym) 

Classification  of  Output  Values  in  the  Testing 

Corresponding  to  the  grading  system  arranged  in  the  training,  a  polarized  (linearly  weighted)  function  is  given  as  an 
indication.  With  this  we  can  define  a  normalized  object  detection  index  (NODI)  for  the  Judgment  of  a  suspected  area: 

^Eto„x(n-(A(-l)/2)] 

NODI  =  -  •••(5) 

xk]x(iV-l)/2 

n=0 

where  n  denotes  the  node  in  the  output  layer,  is  the  output  value  at  node  n,  and  N  is  the  total  number  of  output  nodes. 

Hence  an  object  detection  index  of  0  or  near  0  indicates  a  definite  non-object  and  an  objection  detection  index  of  1  or 
greater  implies  a  definite  case  with  the  judgment  of  the  neural  network.  The  reason  for  the  weighting  is  that  the  score 
line  is  centered  at  (A^-l)/2  (i.e.,  0.5  for  2  nodes  in  the  output  layer)  and  polarization  of  true  and  false  depends  on  the 
position  of  the  nodes. 

After  receiving  NODI  value  from  each  suspected  area,  we  use  a  computer  program  (LABROC)  [14],  based  on 
receiver  operating  characteristic  (ROC)  [15]  analysis  to  evaluate  the  performance  of  the  neural  networks.  The  area 
under  the  curve  referred  to  as  A^,  can  be  read  as  a  performance  index  of  the  system  using  ROC  analysis.  In  general,  the 
higher  the  A^  is,  the  better  the  performance. 


II.B,  Wavelet  Kernels  for  CNN 

Two-Dimensional  Wavelet  Transform 

In  the  process  of  a  two-dimensional  wavelet  decomposition,  horizontal  (x-)  and  vertical  (y-)  directions  are 
considered  preferential.  Following  Mallat’s  2-D  wavelet  analysis  [16],  the  two-dimensional  scaling  function  is 


composed  of  two  one-dimensional  scaling  functions  in  both  directions: 

^ix,y)=  ^ix)^(y)  -(6) 

where  ^(x)  is  a  scaling  function.  The  associated  two-dimensional  wavelets  are  defined  as 

y/'^(x,y)  =  ^{x)yriy)  -O) 

yr'' ix,y)  =  y/{x)^iy)  ...(8) 

W^{x,y)=y/{x)y/{y)  ...(9) 


where  y/^x)  is  the  1-D  wavelet  corresponding  to  the  1-D  scaling  function.  Using  the  sub-band  coding  algorithm,  the 
wavelet  transform  (2-D  DWT)  of  a  matrix  has  four  parts: 

W^{f(n,m))  =  5^[(/(n,m)/i(M-2n,0))/i(0,v-2m)] 

M,  V 

= ^  [/(n,  m)/i^  (m  -  2  n,  V  -  2m)] 


...(10) 
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'^[(f(n,m)h(u  -2n,0))g{0,v  -2m)] 

w,v 

u,v 

(/(”’"0)  =  5][(/(«,  m)g(u  -  2n,0))hi0,v  -  2m)] 

u,v 

='^[fin,m)h„^{u-2n,v-2m)] 

UyV 

^HuUin,  m))  =  Yj  {{f(n,m)g{u-  2n,  0))g(0,  v  -  2m)] 

«,v 

14.  V 

where  h  and  g  functions  are  the  low  and  high  pass  filters  of  the  subband  decomposition  with  condition 
^(m)  =  (“1)  h(l-u).  The  low  pass  filter,  h,  also  must  satisfy  three  criteria  to  construct  the  orthonormal  basis  of 

compactly  supported  wavelets  [1],  [2]:  (a)  ^/t(2M)  =  E'»(2«  +  l)  =  ;  (b)  should  be  orthonormal;  and  (c) 

u  u 

have  a  certain  degree  of  regularity.  The  2-D  filters  at  the  second  forms  of  the  above  four  equations  are  the  vector 
products  of  h  and/or  g  filters.  The  relationship  between  high  pass  and  low  pass  filters  make  the  unification  of  the  above 
four  sets  of  decomposition  possible. 

According  to  the  wavelet  theory,  it  is  known  that  given  a  set  of  h,  one  can  calculate  the  Fourier  transform  of  the 
scaling  and  wavelet  functions  as  follows: 

^iw)  =  H,(e‘'^^)^iwl2)  ...(14) 

'P(>v)  =  //,(e'"^^)<D(>v/2)  ...(15) 

where  Hq  and  H]  are  Fourier  transforms  of  li  and  g  filters,  respectively.  Hence,  both  the  scaling  and  wavelet  functions  can 
be  obtained  through  infinite  recursion  by  using  Eqs.  (14)  and  (15),  respectively. 

Using  the  Low  Pass  Filter  for  the  Four  Channels  Decomposition  of  2-D  DWT 

Using  Eq.  (10)  as  an  example  to  rewrite  the  decomposition  equation  by  replacing  g  with  h  filter,  we  have: 

WLH(f{n,m))=Y  i.fin,m)h(u  -  2n,0))(-iy/i(0,2/n  +  l  -  v)]  ...(16) 

M,V 

or 

"*))  =  51  f(n,-m))h(u  -  2n,0))h{0,v  -  2m )] 

«,v 

~  51  [(((“1)  2n,  V  -  2m)]  ...(17) 

w.v 

=  X  ~  2n, V  -  2m)] 


Converting  Eq.  ( 12)  to  use  the  2-D  low  pass  filter  as  the  kernel  is  a  matter  of  changing  the  orientation  from  y-  to  x- 
direction  (or  combining  both  directions  for  Eq.  (13)).  These  conversions  also  indicate  that  one  can  use  a  single  2-D  filter 
to  compute  the  four  quadrants  of  the  2-D  wavelet  transform  by  flipping  the  matrix  position  in  x-  and/or  y-direction(s) 
and  alternating  the  sign  of  the  flipped  matrix  corresponding  to  the  direction(s). 

The  alternated  sign  of  the  source  matrix  makes  the  convolution  operation  unconventional.  We  have  developed  a 
precalculation  method  that  involves  a  cross  product  of  two  matrices:  the  flipped  version  of  the  original  image  is  the  first 
matrix,  and  the  associated  second  matrix  shown  in  Figure  2  is  composed  of +1  and  -1.  However,  the  resultant  matrix  of 
this  precalculation  (or  cross  product  of  two  matrices)  must  be  held  in  the  computer  memory  to  facilitate  the  computation 
for  forward  convolution  and  the  corresponding  backpropagation.  After  precalculation,  the  size  of  the  intermediate  image 


is  {kl2X  k/1)  times  the  original  image  size.  The  factor  of  1/2 X  1/2  is  due  to  the  1/2  down  sampling  two-dimensionally  in 
a  conventional  forward  wavelet  transform. 
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Vertical  operator 


Horizontal  operator  Diagonal  operator 

Figure  2.  Three  matrices  used  for  the  cross  product  precalculation. 


The  CNN  Convolution  Process  with  Wavelet  Kernels 

Since  we  can  combine  all  four  convolution  operations  by  using  only  one  kernel,  the  wavelet  convolution  operation 
can  be  adapted  by  the  CNN  convolution  processing  described  earlier.  This  decomposition  platform  is  particularly 
convenient  for  the  CNN  backpropagation  training.  Figure  3  shows  the  block  diagram  in  four  sections  of  the  wavelet 
decomposition  processes  for  the  forward  and  backpropagation  calculation. 

The  above  convolution  processing  using  a  wavelet  kernel  would  only  replace  one  out  of  N  feature  maps  in  the 
hidden  layer  of  Figure  1.  To  replace  all  N  feature  maps,  a  total  of  4N  channels  with  N  independent  wavelet  kernels  is 
required.  In  Figure  3,  the  updated  filter  kernel,  does  not  guarantee  holding  the  criteria  to  serve  as  a  low  pass  filter 

for  a  wavelet  transform. 

The  2~D  composed  low  pass  filters  hjj.  in  the  WBCNN  serves  the  same  role  as  the  kernels  K  in  the  conventional 
CNN.  To  satisfy  the  criteria  of  a  wavelet  transform,  the  updated  low  pass  filters  would  require  the  following  conditions 
to  be  fulfilled: 

(A)  using  known  scale  function  of  wavelet  kernels  hn  for  the  initialization  of  CNN  kernels; 

(B)  the  old  and  new  kernel  are  constrained  by  :  (i)  S  ^  +1  [f]  =  V2/ 2  and  (ii) 

u  u 

X  u+2n  where  S is  the  Derac  delta  function  and  n  is  an  integer.  These  two  constraints 

u 

ensures  the  orthogonal  property  of  and  gn  filters; 

(C)  computing  new  scaling  and  wavelet  functions  using  the  recursive  algorithm  as  indicated  by  Eqs.  (14)  and  (15)  to 
ensure  their  existence,  otherwise,  the  new  ‘  in  (B)  must  be  modified. 

One  of  the  original  criteria  regarding  the  so-called  "high  degree  of  regularity"  was  not  enforced  in  the  algorithm. 
The  orthonormality  of  filter  may  not  be  self-sustained  with  each  updated  version.  However,  some  small  modification 
is  possible  to  make  the  final  version  of  hu  orthonormal,  if  the  conditions  of  being  a  wavelet  filter  set  must  be  fully  met. 
Nevertheless,  the  CNN  trained  filters,  which  are  adaptive  versions  of  wavelet  kernels,  may  have  already  served  as 
optimal  feature  extractors  in  the  applications  of  image  pattern  recognition. 

Using  Eq.  (10)  to  update  the  hu,  which  is  the  low  pass  filter  in  2-D,  it  would  require  complicate  algorithm 
development  to  satisfy  the  criteria  required  by  the  wavelet  theory.  However,  Based  on  the  unified  kernel  technique  with 
precalculated  image  So(xk/2,yk/2)  described  earlier,  Eq.  (2)  can  be  rewritten  for  updating  2-D  wavelet  kernel. 

K^^^[t  +  l]=K^^^[t]+f]lS{i)So{xk/2  +  u,yk/2  +  v)  +  aM:^^^[t]  ...(18) 

i 

where  index  i  =  0,l,...(it-l)^  corresponds  to  the  sub-image  of  Sq  matched  to  the  kernel  size.  Eq.  (18)  represents  the 
updated  kernel  suggested  by  the  BP,  these  values  require  a  conversion  to  a  new  wavelet  kernel  h'nhy.  Assuming  the 
wavelet  filter  is  a  scale  vector  (i.e.,  hji^  =  h^hu  =  /iLL,  where  u&v  =  0,1,2, ...  ^-1),  then  only  k  free  parameters  ought  to 
be  trained  for  a  set  of  wavelet  transform.  A  solution  to  satisfy  condition  (B)  and  to  make  approximately  equal  to 
Kii  y  is  given  in  Appendix. 
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Since  the  decomposed  feature  maps  on  the  low-low  sub-channel  have  different  image  characteristics  from  the 
others,  we  free  the  kernel  so  that  the  low-low  channel  is  not  constrained  by  the  other  three  channels.  In  this  way,  each 
set  of  decomposition  has  two  kernels,  one  operates  on  the  low-low  channel,  the  other  one  operates  on  the  remaining 
three  channels.  Without  the  separation  of  kernels,  we  found  that  the  neural  network  had  a  difficulty  in  reaching  a 
convergence  in  the  training.  In  fact,  when  we  used  completely  uncorrelated  wavelet  kernels  the  results  were  improved  as 
depicted  in  the  experiment. 


activation  function  —  —  error  back-propagation  training  through  inverse  convolution 

precalculation  for  horizontal  convolution  operation 
precalculation  for  vertical  convolution  operation 

precalculation  for  diagonal  convolution  operation 

Figure  3.  Signal  propagation  block  diagram  for  a  section  of  the  convolution  operation  using  a  wavelet  kernel  in  the 

CNNAVK  architecture. 


ILC.  Classification  Invariance  of  Matrix  Operations 

It  is  a  good  strategy  to  use  invariance  or  variance  characteristics  of  the  system  for  training  of  a  neural  network.  In 
many  situations,  the  orientation  of  an  image  pattern  does  not  used  as  a  clinical  indication  to  associate  with  a  disease. 
Hence,  we  can  take  advantage  of  this  characteristic  as  an  invariance.  In  practice,  one  can  rotate  and/or  to  shift  the  input 
matrix  and  maintain  the  same  output  assignments  for  the  training.  This  method  may  cause  two  effects  to  the  neural 
network:  (a)  to  instruct  the  neural  network  that  the  rotation  and  shift  of  input  vector  would  receive  the  same 
classification  result;  and  (b)  to  increase  total  number  of  the  training  samples  which  is  expected  to  enhance  the 
performance  of  the  neural  network.  From  the  point  of  view  of  intermediated  matrices  in  the  CNN,  the  convolution 


operation  maintains  the  geometrical  correspondence  of  feature  maps  and  the  original  matrix.  In  other  words,  the  features 
maps  can  be  rotated  and/or  shifted  using  the  same  convolution  operator  on  the  matrix  before  or  after  applying  a 
geometrical  function  as  shown  in  eqs.  (19)  and  (20). 

Sxi(i,j)0-^n)  = - j - - - T  ...(19) 

1  +  exp  j  v);  n,  m)  ®  m)l 

or 

Sx{(i,j)e',n)=  Sxiiicos  0  +  ysin  9,-i  sin  6  +  ycos  0)\n).  ...(20) 

In  practice,  equation  (18)  is  much  computationally  efficient.  Besides,  these  two  equations  have  different  meanings 
in  terms  of  neural  network  training.  In  equation  (19),  the  neural  network  treats  each  rotated  matrix  as  a  separated 
matrix.  Therefore,  the  backpropagation  computation  is  always  in  effects  for  each  epoch.  However,  the  backpropagation 
would  only  be  taken  once  for  all  corresponding  rotated/shifted  feature  maps.  Hence,  the  convolution  kernel  (or  wavelet 
kernel  when  wavelet  operation  is  used)  receives  much  more  training  at  the  former  method  than  that  at  the  latter. 

Rotation  may  require  interpolation  which  would  slightly  alter  the  pixel  values  and  should  be  acceptable  for  the  input 
of  the  CNN.  However,  the  use  of  shifting  can  be  complicated,  because  it  involves  (a)  how  important  the  center 
information  for  disease  patterns  are  in  the  neural  network  learning;  and  (b)  how  much  shifting  can  be  used  without 
sacrificing  critical  portions  of  image  information. 

Ill,  Application  To  The  Detection  Of  Microcalcifications 
III.A.  The  wavelet-based  convolution  neural  network 

In  this  initial  study,  we  used  only  a  one-hidden  layer  structure  and  eliminated  all  the  complex-cell  layers.  The 
hidden  layer  is  composed  of  two  dozen  groups  of  feature  maps.  We  basically  replace  the  convolution  kernel  with 
wavelet  constrained  kernel  in  Figure  1  in  the  study.  The  wavelet  kernels  to  be  trained  are  responsible  for  the  feature 
extraction  from  input  matrix.  The  fully  connected  nets  are  trained  in  the  same  time  as  the  kernels  between  input  and  the 
first  hidden  layers  and  are  responsible  for  merging  extracted  features  for  classification.  Although  two  output  nodes  are 
shown  in  Figure  1  for  generalization  of  using  the  system  particularly  when  a  fuzzy  output  training  is  used.  For  the  crispy 
training  (e.g.,  (0,1)  and  (1,0)  for  false  and  true  cases,  respective),  only  one  node  is  necessary.  Because  the  other  node 
will  respond  conjugatively.  In  this  study,  the  crispy  training  method  was  adapted  and  only  one  node  was  used. 

An  image  block  of  16^16  pixels  (i.e.,  1.7^1.7  mm^)  with  a  convolution  kernel  size  of  6^6,  which  was  suggested 
by  the  previous  study  for  the  detection  of  microcalcifications  [11],  was  used  in  this  study.  The  second  layer  consists  of 

different  number  of  subimages  for  three  different  experiments.  Each  group  has  12^12  pixels  formatted  in  a  square  array. 
The  output  layer  has  2  nodes  (groups)  which  fully  connect  to  the  second  layer. 

In  an  earlier  study,  we  found  that  2-hidden  layer  architecture  of  the  CNN  outperformed  than  one-hidden  layer 
CNN  [10][11].  In  this  paper,  we  concentrated  on  the  convolution  operation  of  the  CNN.  Therefore,  only  one  single 
hidden  layer  consisting  of  groups  of  feature  maps  will  be  used  to  simplify  the  study. 

III.B.  The  Experiment  for  the  Detection  of  Microcalcifications  on  Mammograms 

We  have  evaluated  the  CNN  and  CNNAVK  algorithms  in  the  detection  of  subtle  microcalcifications.  A  total  of 
68  mammograms  (only  38  of  them  consists  of  subtle  microcalcifications)  were  digitized  by  a  laser  scanner  with  a  pixel 
size  of  0. 105  mm.  The  initial  search  prior  to  the  final  interpretation  by  the  neural  network  follows  the  basic  scheme 
which  uses  background  removal  and  signal  extraction  methods  to  pre-scan  the  mammograms  and  to  extract  all  possible 
suspected  areas  [7],  [17],  [18].  After  the  pre-scan  process  by  the  computer  program,  the  68  digital  mammograms 
provide  265  true  and  1,821  false  subtle  microcalcifications.  Figure  4  shows  some  of  the  suspected  regions  which  may  or 
may  not  contain  microcalcifications.  In  this  study,  grouped  jack-knife  experiment  was  performed  [19].  The  training  set 
consists  of  15  normal  and  19  abnormal  cases  randomly  selected  from  the  database  and  the  testing  set  consists  of  the 
residual  15  normal  and  19  abnormal  cases.  A  total  of  10  combinations  of  training  and  test  was  performed. 

The  image  blocks  of  suspected  calcification  were  automatically  extracted  and  were  centered.  Prior  to  the  CNN 
process,  the  background  of  all  the  image  blocks  were  removed  using  a  three-level  wavelet  high-pass  filtering  technique. 
Specifically,  after  extracting  each  suspected  region  from  the  original  digital  mammogram,  a  three-level  wavelet 

transform  suggested  by  Daubechies  was  used  and  only  the  lowest  frequency  was  eliminated  prior  to  reconstructing 
the  image  block.  The  high-pass  filtered  image  blocks  were  used  as  the  input  of  the  CNN.  The  kernels  of  CNNAVK 
were  initialized  with  Daubechies’  8-tap,  6- tap  of  a  composed  trigonometrical  function  [20],  or  Haar’s  4-tap  filters.  Each 


filter  was  used  repetitively  with  additional  7  times  for  48  kernels  (i.e.,  24  for  low  and  24  for  the  3  high  pass  filters  which 
produce  96  subimages)  and  additional  14  times  for  120  kernels  (i.e.,  60  for  low  and  60  for  3  high  pass  filters  which 
produce  240  subimages)  in  the  CNNAVK  studies.  The  same  filters  were  also  used  for  120  fully  uncorrelated  kernels 
which  possesses  only  120  subimages. 

The  average  NODls  of  eight  rotated  image  versions  were  used  to  evaluate  the  performance  of  the  neural 
networks  using  the  LABROC  program.  One  must  realize  that  the  detection  of  clustered  microcalcifications  is  clinically 
more  significant  than  that  of  individual  calcifications,  since  the  clustered  microcalcifications  (three  or  more)  are  a  strong 
indication  to  breast  carcinoma  in  radiological  diagnosis.  Once  NODls  were  collected,  the  clinical  criterion  was  added. 
Hence,  the  computer  program  rejected  suspected  clusters  containing  only  one  or  two  calcifications  and  calculated  the 
average  NODI  among  the  clustered  calcifications  for  the  ROC  evaluation.  The  clustering  procedure  was  done  by 

grouping  the  detected  microcalcifications  in  a  Icm^  region  of  the  mammogram.  Five  ROC  curves  with  different  CNN 
kernels  are  shown  in  Figure  5.  The  syntax  of  "nK"  and  "nWK"  represent  "n"  groups  of  non-constraint  kernels  and 
wavelet  constraint  kernels  used  in  CNN  and  CNN/WK  experiments,  respectively.  The  AzS  of  original  CNN  and  newly 
developed  CNN/WK  were  0.91  and  0.83,  respectively,  by  using  24  initial  kernels.  However,  the  results  of  A^s  were 
0.89  and  0.90  with  the  CNN  and  CNN/WK  respectively,  by  using  60  initial  kernels.  Both  experiments  were  using  the 
same  wavelet  kernel  for  high  frequency  components.  When  all  wavelet  kernels  were  uncorrelated,  the  results  were 
further  improved  to  Az  =  0.93  using  120  uncorrelated  kernels  (CNN/120UWK,  30  initial  kernels). 


Figure  5.  Four  ROC  curves  represent  different  performance  of  convolution  neural  networks  in  the  detection  of 

clustered  microcalcifications.  Note  that  CNN/nK  and  CNN/2nWK  have  compatible  number  of  hidden  nodes 
and  nets  in  the  CNN. 

(a)  CNN/24K:  Az  =  0.91;  (b)  CNN/48WK;  Az  =  0.83;  (c)  CNN/60K:  Az  =  0.89; 

(d)  CNN/120WK:  Az  =  0.90;  and  (e)CNN/120lJWK:  Az=  0.93. 

IV.  Conclusions 

In  this  experiment,  the  Az  of  the  CNN/24WK  was  0.83  which  was  lower  than  0.91  of  the  CNN/24K.  However,  the 
Az  was  greatly  improved  to  0.93  when  80  uncorrelated  wavelet  kernels  was  used.  This  maybe  because  only  an  average 
of  approximate  eight  free  parameters  were  available  in  each  kernel  of  the  CNN/WK.  On  the  other  hand,  the  CNN  had 
6X  6  (or  36)  free  parameters  in  each  kernel.  We  found  that  48  groups  for  CNN/WK  (which  has  compatible  number  of 
hidden  nodes  as  CNN/24K)  were  not  sufficient  to  extract  necessary  features  for  classification.  When  120  kernels  of 


kernels  were  used,  the  CNNAVK  would  have  sufficient  free  parameters  in  which  lead  to  a  higher  ROC  performance. 
We  further  discovered  that  it  is  important  to  uncorrelated  the  kernels  to  increase  the  number  of  free  parameters  which 
seems  to  be  an  essential  factor  to  improve  the  performance  of  the  convolution  neural  network. 
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Appendix 

As  indicated  in  Eq.  (18),  the  updated  weights,  ]  or  of  the  kernel  suggested  by  the  BP  at  t-\-]  training 

iteration  are  independent.  To  properly  use  this  suggestion  for  making  a  new  wavelet  kernel,  let’s  assume  that  there  exists 
a  set  of  h'u  so  that  both  summations  of  even  elements  as  well  as  odd  elements  of  the  new  filter  vanish  when  subtracting 

^/2  from  each  of  them. 

u 

g2(^’„)  =  5:/l’2„+l-V2/2  =  0;  ...(A2) 

u 

gp  ih\  )='Zh\X  K'^+2n  =  ^u,u+2n  ^  where  p  =  3,4,...  k-3.  ...(A3) 

u 

In  order  to  update  each  2-D  filter  very  close  to  AT  a  function  based  on  the  square  difference  is  used  for  the  derivation 

.  ...(A4) 

M,V 

Here  we  intend  to  minimize  function,/,  subject  to  the  constraints  indicated  in  Eqs.  (Al),  (A2),  and  (A3).  Lagrangian 
multiplier  method  can  be  employed  to  solve  this  problem  by  adding  and  multiples  of  dgi,  and  dg2  to  obtain 

=  Q  ...(A5) 

q 

where  d  represents  the  differentiation  operation  of  a  function;  Lagrangian  multipliers.  The  partial  differentiation 
form  of  (A5)  is  given  below 

- zr - =0  foru=0,l,2,...k-l.  ...(A6) 

) 

Eqs.  (A5),  (A7),  (Al),  and  (A2)  represents  a  set  of  k  +2  equations  which  can  solve  2k+3  unknowns,  and  hu-  In  this 
case,  all  need  not  be  determined. 
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Introduction 

classified  them. 

Methods 

Two  different  series  were  run  representing  different  settings  of  the  ^ADx  dgorito.  ^Sen^l  .  2W 

mammogram  images  were  analyzed,  to  Senes  2,  set  to  detect  a  miilmura  of  three 

algorithm  were  changed  between  Senes  1  and  2.  In  S^es  1  •  ^  3^^  the  algorithm  to 

threshold. 

Ahnnrmaliries  seen  at  the  sites  of  CADx  localization  were  classified  as  representing  artifacts,  trae  positive  fmdmgs 
we  „»mal  non.alciiied  »a.omic  emicKres  „  H,  te 

analysis. 

T„e  CADX  progr™.  wa,  n»  prior  »  ^ 

from  the  clinical  cases  of  the  breast  cancer  scr^^g  •  j^teria  were  met  the  cases  were  assigned  or 

current  and  prior  study  and  to  have  images  t)oth  “ 

not  assigned  to  the  CADx  group  by  selecting  processed  by  the  CADx  program  and  the 

Lumiscan  150  film  scaimer  (Lumisys,  Suim^  ,  )•  y  -.(Uoiogist.  who  by  then  had  interpreted  the 

results  returned  later  that  day  to  the  radiologist  for  assessment  classify  any  identified 

mammograms  for  the  official  clinical  report,  priced  to  and  5  X  lens.  Only 

was  stable,  no  additional  evaluation  was  done  of  this  cluster. 

I„  „»y  of  *=  rites  identffledbytoC^x^goridmtowasMjft™^^^^ 

the  CADx  detection.  We  chose  to  code  these  findmgs  separately.  Because  01  ims,  mere  j 

detections  indicated  than  the  number  of  false  positives  per  ’  perforce  of  the  CADx  program, 

multiple  Ktifacts  o,  combmation,  of  me  “‘I™*'?'*”™  we  l»!ced  at 

and  so  we  chose  to  record  all  fbu^gs.  In  asressmg  *e  num  r  ..ith  the  nonoldum  structures,  we 

each  site  that  was  recorded.  If  at  least  one  nucrocalcification  was  pres  ^  f  i  ^  vp«!  ner  imase  We  did 
Sded  ^ras  a  true  detection  for  calcifications  in  determining  the  number  of  false  Pos^ves  ^age^W^ 

SaSte  calculations  for  the  number  of  false  positives  per  image  usmg  the  catena  of  1  or  more  and  2  or  more 

niicrocalcifications  in  the  identified  field. 
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Findings 

True  negatives,  true  positives,  false  negatives,  false  positives 
True  negatives 


In  Series  1,  of  the  maiMogram  films  had  no  CADx  detections  and  no  clusters  of  calcifications  were  seen  when 
the  radiologist  re-assessed  the  film..  In  Series  2,31%  of  the  mammogram  films  had  no  CADx  detections  and  no 
clusters  of  calcifications  on  film  re-assessment. 

True  positives 

True  positives  ^  defmed  in  this  study,  were  detections  with  one  or  more  small  benign  calcifications  or  indeterminate 
nucrocalcifications.  In  this  senes,  the  true  positive  detection  rate  was  86%  in  Series  1  and  94%  in  Series  2  when 
measiued  agamst  a  smgle  radiologists  interpretation  of  the  mammographic  images  with  the  CADx  output  and  when 
using  e  presen^  of  at  least  one  microcalcification  as  a  true  positive  detection.  Overall,  because  we  recorded 
sep^tely  each  findmg  in  a  Itxation  identified  by  the  CADx  program,  29%  of  the  details  found  in  regions  identified 
J  ^  program  m  Senes  1  and  27  %  of  the  detections  in  Series  two  were  true  positives.  Vascular 
calcifications  were  considered  to  be  false  positives.  (Figure  1  demonstrates  a  true  positive  detection) 


^^^en  tested  pre^dcmsly  with  a  proven  set  of  cases,  the  CADx  algorithm  performance  was  87%  true  positive 

positive  clusters  per  image.  (Lo  SCB.  Chan  HP,  Lin  JS,  et  al.  Artificial  convolution 
neural  network  tor  medical  image  pattern  recogmtion.  Neural  Networks.  1995. 8:1201-1214.) 


True  positive  and  true  negative  findings  combined 


If  one  combines  the  true  negatives  and  true  positive 
films  in  Series  2  were  correctly  classified. 


cases,  73%  of  the  mammogram  films  in  Series  1  and  58%  of  the 


False  negative  de,.cUo„s  we,,  def^  a.  ca»s  i„  wK=h^^^^ 

nresent  on  the  mammogram  film,  but  was  not  detected  by  the  CADx  algontnm.  t-aise  ncga 
8%  of  films  in  Series  l^d  3%  of  films  in  Series  2.  (Figure  2  demonstrates  a  Mse  negative  clust 
microcalcifications  (arrow)  next  to  a  true  positive  calcification  cluster  (arrowhead).) 


Fake  positive  detections 

False  positive  detections  accounted  for  71%  of  the  details  recorded  in  Series  1  73%  °f  in  Series  Z  As 

previously  stated  a  false  positive  location  could  have  multiple  details  withm  it  that  could  explam  the  detection  an 

each  was  recorded  separately. 

Fake  positive  detections  per  image 

In  recording  the  number  of  artifacts,  we  recorded  separately  each  of  the  types  of 

l«atiol  indicated  by  fte  CADx  program  in  which  “““ 

there  were  0.7  false  positive  detections  per  image  and  m  Senes  2  there  were  U.y  raise  p 

If  one  uses  the  criteria  of  two  or  more  calcifications  for  a  true  positive  detection,  in  Series  1,  the  false  positive  rate 
was  0.8  per  image  and  for  Series  2,  the  rate  was  1.0  false  positive  detections  per  image. 


Detailed  analysis  of  false  positive  detections 


Developer  artifarf 


Developer  artifact  accounted  for  2%  of  detections  in  ^ 

defects  are  small  punctate  details  often  with  a  halo  around  them  in  Figure  3,  Developer  sediment 


''  ' 


Punctate  densities 

f^fc?cScL^on^  but  the 

dilated  ducts.  They  have  the  size  and  distribution  of  small  ben^im  n  terminal  glandular  elements  oi 

some  cases  we  could  not  determine  whether  small  ralrifir^at^  ^  calcifications,  but  have  a  lesser  radiodensity.  In 
of  the  detections  in  Series  1  and  37%  in  Series  2  Thev  are  present  within  them.  They  accounted  for  21  % 

o  them.  Punctate  densities  are  differem  r  ^ 

film  granulanty  is  seen.  The  algorithm  did  not  seeifto  ^  because  of  the  magmfication  used  for  this  prints 

present  and  the  algorithm  only  Selected  a  few  regions  whSrfoS  granularity  as  these  were  Jidel 

icgiuiii  wnere  otner  hndmgs  were  present  as  well. 


mu. 
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Grid  artifacts 


Short  parallel  lines  with  more  punctate  regions  of  increased  optical  density  are  seen  in  some  regions  of  fds^ositive 
detections.  They  are  considered  to  represent  grid  artifact  from  failure  of  rapid  enough  movement  of  the  grid.  Th®y 
were  more  common  in  the  older  mammograms  (1994-5)  than  in  the  1996  mammog^s.  They  accounted  for  13%  of 
the  false  positive  detections  in  Series  1  and  5%  in  Series  2.  They  are  demonstrated  in  Figure  5  and  localized  with 
arrows. 


Film  defects 

Film  defects  (processor  pick  defects)  are  commonly  seen  in  mammogram  films.  They  were  28%  of  the  false  positive 
detections  in  Series  1  and  25%  of  those  in  Series  2.  They  are  demonstrated  in  Figure  6. 
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Fibers  are  short  lines,  either  straight  or  curved,  thought  to  represent  of 

breast  They  are  similar  to  the  punctate  densities  and  there  exact  nature  is  uncertain.  They  ^  , 

fise  positive  detections  in  Series  1  and  1%  of  those  in  Series  2.  They  are  demonstrated  in  Figure  7  and  localized 

with  arrows. 
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Va.scular  calcification 

Vascular  calcification  accounted  for  4%  of  the  false  positive  detections  in  Series  1  Md  2%  of  the  false  positive 
detections  in  Series  2.  They  are  demonstrated  in  Figure  8  (arrow).  The  algorithm  did  not  detect  long  linear 
calcifications,  but  detected  vascular  calcifications  when  they  were  punctate. 


-i'.- -^V  '  '  -i  ■t.y'w;.-,*. 
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Deodorant 


Small  white  flecks  of  calcium  like  material  were  seen  in  the  axilla  of  some  patients.  The  frequency  of  this  was  not 
recorded,  but  the  algorithm  did  identify  this  finding.  This  is  demonstrated  in  figure  9  (arrows). 


Discussion 

False  positive  detections  by  computer  aided  diagnosis  programs  are  a  considered  to  be  undesirable.  There  is  a  tradeoff 
between  the  true  positive  detection  rate  and  the  false  positive  detection  rate.  Our  goal  in  this  project  was  to  determine 
by  careful  analysis  the  several  causes  of  false  positive  detections  in  our  algorithm.  The  results  show  that  image 
quality  of  the  original  mammogram  is  an  important  determinate  of  the  number  of  false  positives.  Image  quality 
defects  accounted  for  41  %  of  the  false  positive  detections  in  Series  1  and  32%  in  Series  2,  These  false  positive 
findings  should  decrease  with  careful  attention  to  film  quality.  Most  of  the  remaining  false  positive  detections  were 
caused  by  normal  punctate  structures  in  the  normal  breast.  These  punctate  defects  resemble  the  appearance  of  very 
small  calcification  clusters,  but  have  a  lower  radiodensity.  Their  size  and  density  appear  to  represent  a  continuum  of 
increasing  size  and  density  that  appears  to  merge  with  true  (benign)  microcalcification  appearance.  We  believe  these 
represent  ctilated  glands  or  ducts.  In  some  cases  we  were  uncertain  whether  or  not  calcifications  in  dilated  ducts  were 
present  or  if  we  were  seeing  the  dilated  duct  itself.  Previous  reports  on  the  accuracy  of  CADx  programs  for 
microcalcifications  have  indicated  that  vascular  calcifications  and  film  emulsion  defects  were  important  contributors 
to  CADx  false  positives.  In  this  study,  we  have  identified  several  additional  causes  of  CADx  false  positive 
detections. 


Conclusion 

False  positive  detections  in  computer  aided  microcalcification  programs  are  not  random  responses  of  the  computer 
algorithm  to  unknown  features.  Better  understanding  of  their  causes  should  promote  algorithm  modification.  Since 
the  computer  algorithm  is,  in  general,  responding  to  true  punctate  or  short  linear  findings  that  resemble 
microcalcifications,  this  suggests  that  computer  aided  systems  will  function  best  with  high  quality  artifact  free  films 
and  that  computer  detection  systems  may  need  to  be  combined  with  improved  classification  systems  to  decrease  the 
number  of  false  positive  detections. 
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Abstract 

In  the  clinical  course  of  detecting  masses,  mammographers  usually  evaluate  the  surrounding 
background  of  a  radiodense  when  breast  cancer  is  suspected.  In  this  study,  we  adapted  this  fundamental 
concept  and  computed  features  of  the  suspicious  region  in  radial  sections.  These  features  were  then  arranged 
by  circular  convolution  processes  within  a  neural  network,  which  led  to  an  improvement  in  detecting 
mammographic  masses. 

In  this  experiment,  randomly  selected  mammograms  were  processed  by  morphological  enhancement 
techniques.  Radiodense  areas  were  isolated  and  were  delineated  using  the  region  growing  algorithm  with  a 
valley  blocking  technique.  The  boundary  of  each  region  of  interest  was  then  divided  into  36  sectors  using  36 
equi-angle  dividers  radiated  from  the  center  of  the  area.  Four  features  at  each  section  were  computed:  (1)  the 
radius,  (2)  the  normal  angle  of  the  boundary,  (3)  the  average  gradient  along  the  radial  direction,  and  (4)  the 
gray  value  difference  (i.e.,  contrast)  along  the  radial  direction.  Hence,  144  computed  features  (i.e.,  4  features 
per  sector  for  36  sectors)  were  used  as  input  values  for  the  newly  designed  multiple  circular  path  neural 
network  (MCPNN).  The  neural  network  is  constructed  to  emphasize  on  the  correlation  information 
associated  with  the  feature  interactions  within  the  angle  and  between  adjacent  angles. 

We  have  tested  this  approach  on  our  research  database  consisting  of  91  mammograms.  The  over-all 
performance  in  the  detection  of  masses  was  0.78-0.80  for  the  areas  (Az)  under  the  ROC  curves  using  the 
conventional  neural  network.  However,  the  performance  was  improved  to  Az  values  of  0.84-0.89  using  the 
multiple  circular  path  neural  network. 

1.  Introduction 

It  is  well  known  that  effective  treatment  of  breast  cancer  calls  for  early  detection  of  cancerous  lesions 
(e.g.,  clustered  microcalcifications  and  masses  associated  with  malignant  cellular  processes)’'^’\  Breast 
masses  appear  as  areas  of  increased  density  on  mammograms.  It  is  particularly  difficult  for  radiologists  to 
detect  and  analyze  a  suspected  area  where  a  mass  is  overlapped  with  dense  breast  tissue.  These  masses  are 
more  readily  seen  as  time  progresses,  but  the  further  the  tumor  has  progressed,  the  lower  the  possibility  of  a 
successful  treatment.  Therefore,  increasing  the  chances  of  early  breast  cancer  detection  in  improving  today’s 
clinical  system  is  of  vital  importance  to  breast  cancer  patients. 

Several  research  groups  have  developed  computer  algorithms  for  automated  detection  of 
mammographic  masses'*'^  ®'’  *.  At  least  one  of  these  groups  has  also  attempted  to  classify  the  malignant  or 
benign  nature  of  the  detected  tumors’.  The  results  of  these  detection  programs  indicate  that  a  high  true¬ 
positive  (TP)  rate  can  be  obtained  at  the  expense  of  2  or  3  false-positive  (FP)  detections  per  mammogram. 
This  FP  rate  is  unacceptably  high  for  mass  detection  in  clinical  practice.  Mammographically,  a  multiplicity 
(more  than  two)  of  similar  benign-appearing  breast  lesions  argues  strongly  for  benignity®’  and,  indeed, 
the  more  masses  that  are  identified,  the  less  chance  that  they  represent  cancer'^.  If  the  computer  indicates 
multiple  detections  on  each  mammogram,  the  radiologist  has  to  seek  out  the  one  mass  that  has 
mammographic  features  that  differ  from  the  others.  The  significant  lesion  may  be  missed  due  to  the 
multiplicity  of  possible  lesions.  We  therefore  believe  that  a  more  useful  and  fundamental  approach  to  CADx 
of  masses  is  to  devise  computer  programs  to  analyze  features  of  a  suspected  mass,  which  are  detected  by  the 
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radiologist,  and  provide  feature  measures  and  estimates  of  the  likelihood  of  malignancy  by  comparing  the 
computer’s  database.  The  computer  therefore  serves  as  a  second  opinion  and  also  provides  a  reproducible 
and  objective  evaluation  of  the  mass.  With  this  aid,  the  radiologist  may  also  increase  his/her  sensitivity  by 
lowering  the  threshold  of  suspicion,  while  maintaining  the  overall  specificity  and  reading  efficiency. 

2.  Clinical  Background  of  Breast  Lesions  and  Technical  Planning  in  Mass  Detection 
2.1  Brief  Description  of  Clinical  Background 

Most  commonly,  breast  cancer  presents  as  a  mass.  The  same  lesion  shows  a  somewhat  different 
picture  from  one  projection  to  the  other.  Difficulties  in  mass  detection  also  vary  with  the  underlying  breast 
parenchyma.  In  Ae  fatty  breast,  masses  are  generally  easy  to  detect.  With  the  dense  breast,  mass  detection 
is  more  difficult  and  auxiliary  signs  aid  this  detection.  Breasts  can  contain  one,  several,  or  many  masses. 

When  there  is  one  mass,  the  decision  process  is  based  on  its  size,  shape,  and  margins.  The  larger  the  mass  is 
and  the  less  well  defined  its  margins,  the  greater  the  chance  of  cancer.  When  there  are  several  masses,  one 
looks  at  each,  trying  to  determine  whether  any  has  features  to  suggest  cancer  (poorly  defined,  spiculate, 
unusually  radiodense  for  size)  and  one  also  looks  to  see  whether  any  mass  is  different  in  appearance  from  the 
others.  Multiple  small,  well-defined,  similar  masses  presenting  bilaterally  are  all  likely  to  be  benign.  The 
greater  the  asymmetry,  size,  lack  of  circularity,  edge  unsharpness,  and  radiodensity,  the  more  suspicious. 

Clinical  features  of  breast  masses  are  further  discussed  below: 

Density  -  Malignant  lesions  tend  to  have  greater  radiographic  density  due  to  high  attenuation  and  less 

compressibility  of  cancer  than  normal  tissue.  Radiolucent  lesions  are  typically  benign  and  the 
diagnosis  can  be  made  from  the  mammogram. 

Size  -  If  the  lesion  has  morphological  features  suggesting  malignancy,  it  should  be  considered  suspicious 

regardless  of  the  size.  Isolated  masses  with  non-cystic  densities  greater  than  8  mm  in  diameter  can  be 
malignant.  In  general,  the  larger  a  lesion,  the  more  suspicious  it  is. 

Shape  -  The  more  irregular  the  shape  of  a  lesion,  the  more  likely  the  possibility  of  malignancy.  Lesions  tend 
to  be  round,  ovoid  and/or  lobulated.  Small  and  frequent  lobulations  are  suspicious.  Lesions  in  the 
lateral  aspect  of  the  breast  near  the  edge  of  the  parenchyma  with  a  reniform  shape  and  a  hilar 
indentation  or  notch  usually  represent  a  benign  intramamraary  lymph  node.  Breast  carcinoma  hidden 
in  the  dense  tissues  can  cause  parenchymal  retraction,  which  possess  different  shapes. 

Margins  -The  margins  of  the  lesion  should  be  carefully  evaluated  for  areas  of  spiculation,  stellate  patterns  or  ill- 
defined  regions.  Most  breast  cancers  have  ill-defined  margins  secondary  to  tumor  infiltration  and 
associated  fibrosis.  The  appearance  of  spiculations  and  a  more  diffuse  stellate  pattern  are  almost 
pathognomonic  for  cancer.  Lesions  with  sharply  defined  margins  have  a  high  likelihood  of  being 
benign;  however,  up  to  7%  of  malignant  lesions  can  be  well  circumscribed. 

Recently  these  clinical  features  have  been  adapted  in  a  standard  of  the  American  College  of  Radiology 
(ACR).  This  diagnosis  standard  is  known  as  “Breast  Imaging  -  Reporting  and  Data  System”  (BI-RAD)’\ 

2.2.  Technical  Planning 

In  this  study,  our  goal  was  to  extract  clinically  suspicious  lesions.  The  differentiation  of  benign  and 
malignant  status  was  beyond  the  scope  of  this  work.  Hence,  we  will  only  provide  methods  in  extracting 
potential  lesions  from  glandular  tissue  in  the  following  sections.  (Note  that  lesions  can  be  overlapped  with 
dense  breast  parenchyma.)  The  study  was  conducted  with  the  following  steps:  (1)  use  background  correction 
method  and  morphological  operations  to  extract  radio-opaque  areas,  (2)  delineate  the  boundary  of  the  areas, 

(3)  compute  the  features  and  texture  of  the  masses  with  emphasis  on  the  boundary,  (4)  design  and  plan 
training  strategy  using  a  neural  network  as  classifier  for  the  recognition  of  mass  features.  An  overall 
detection  scheme  of  our  proposed  framework  is  shown  in  Figure  1. 
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Figure  1.  A  flowchart  for  the  detection  of  masses  in  this  study. 


3.  Development  of  Technical  Methods 

l,.l  -  Preprocessing  for  Image  Consistency  and  Mass  Enhancement  Using  Morphological  Operations 

in  automatjc  mass-detection  is  that  mammographic  masses  are  often 
ricfn  breast  tissues.  In  such  cases,  it  is  necessary  to  remove  bright  background  caused  by  breast 

mis^detection^^^  mass-signals.  For  this  purpose,  background  correction  is  an  indispensable  technique  for 

morphology  is  powerful  in  analyzing  and  describing  geometrical 
relations.  Essentially  it  is  a  formalization  of  intuitive  concepts  such  as  size  or  shape.  The  two  basic 

“erosion”  and  “dilation,”  which  are  consistently  defined  for  binary  and  gray- 
^c£ing"*S„  taTfitTaTfolfows: 


opening: 

closing: 


XBs(Xe  B)®B, 
x"  =(X@B)QB, 


...(1) 

...(2) 


°  ^  the  structuring  element,  and  0  and  0  indicate  the 

f«?K  r  ^  erosion,  respctively.  Based  on  the  “opening”  operation,  we  have  developed  an 
operation  for  background  correction.  The  operation  is  represented  by 


X-Xg=X-{XQ  B)®B. 


.(3) 


subtraction  of  the  image  processed  by  the  operator  “opening”  from  the  original 

2  shows  the  effect  of  the  operation  represented  by  Eq.  (3);  (a)  illustrates  a  structuring  element 
(b)  shows  the  ongina  signal  (gray  line)  and  the  processed  signal  (black  line)  by  “opening”,  and  (c)  denotes  ’ 

£  S  Kly  ‘  (3)-  (c)  is  the  subtractiL  of  the  black  line  signallrom 

me  gray  line  signal  m  ^).  Note  that  the  detected  peak  signals  were  not  affected  by  the  operation.  Hence  the 
mass  signals  detected  by  the  operation  retain  their  original  shapes. 

strucfiinW  aiT  peak  significantly  depends  on  the  size  of  the 

^  ®  ®  smaller  than  the  structuring  element,  can  be  detected.  In  our  mass 

Aan  ?2  nixekln^H^  pixel-diameter  structuring  element  will  be  used  to  detect  masses  whose  sizes  are  less 
Sm  52  pixels  in  diameter.  An  object  with  a  diameter  of  52  pixels  in  a  512><625  pixel  reduced  image  occupies 
250  pixels  in  its  original  digitized  image,  and  its  real  size  is  expected  to  be  about  2.5  cm.  ^ 
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(a)  (b)  '  (c) 

Figure  2.  Effect  of  operation  in  Eq.  (3):  (a)  structuring  element,  (b)  original  signal  (gray  line)  and  signal 
after  opening  (black  line),  and  (c)  output  signal  of  operation  in  Eq.  (3). 


3.2.  Feature  Extraction  of  Masses 

Feature  extraction  methods  have  played  essential  roles  in  many  pattern  recognition  tasks.  Once  the 
feamres  associated  with  an  image  pattern  are  extracted  accurately,  they  can  be  used  to  distinguish  one  class 
of  patterns  from  the  others.  Recently,  many  investigators  have  found  that  the  multilayer  perceptron  neural 
network  using  the  error  back  propagation  training  technique  is  a  very  powerful  tool  to  serve  as  an  analyzer 
(or  classifier).  Recently,  the  back  propagation  neural  network  (BPNN)  for  classification  of  features  has 
widely  been  used  in  the  field  of  computer-aided  diagnosis''*  *^  ’®  ’’  *®. 

The  success  of  using  an  analyzer  for  a  pattern  recognition  task  would  rely  on  two  issues:  (a)  selected 
features  that  could  describe  discrepancy  between  patterns  and  (b)  accuracy  of  the  feature  computation. 
Should  either  one  fail,  no  analyzer  or  classifier  would  be  able  to  achieve  the  expected  performance.  By 
analyzing  many  clinical  samples  of  various  sizes  of  masses,  we  found  that  the  peripheral  portion  of  the  mass 
plays  an  important  role  for  mammographers  to  make  a  diagnosis.  The  mammographer  usually  evaluates  the 
surrounding  background  of  a  radiodense  area  when  breast  cancer  is  suspected. 

We,  therefore,  performed  boundary  detection  of  the  suspected  masses  on  the  morphologically 
enhanced  mammogram.  A  region  growing  with  valley  blocking  technique  was  employed  to  delineate  all  the 
suspected  areas.  Then,  the  boundary  was  divided  into  36  sectors  (i.e.,  10°  per  sector)  using  36  equi-angle 
dividers  radiated  from  the  center  of  suspicious  area.  The  following  features  were  computed  within  each  10° 
sector  of  the  area: 

(a)  "1"  -  the  length  from  the  center  of  mass  to  the  shortest  boundary  segment. 

(b)  "a"  -  the  normal  angle  of  the  boundary  segment  (or  the  value  of  cos(a)). 

(c)  "g"  -  the  average  gradient  of  gray  value  on  the  segment  along  the  radial  direction. 

Technically  speaking,  this  set  of  gradient  values  may  also  serve  as  a  fuzzy  system  for  the  input 
layer  in  the  neural  network  to  be  described. 

(d)  "c"  -  the  gray  value  difference  (i.e.,  contrast)  along  the  radial  direction. 

(averaged  gray  value  (hi)  calculated  from  the  mass  area  located  at  'T73  inside  the  boundary  and  the 
average  background  value  (bo)  calculated  from  the  peripheral  area  near  ’T73  outside  of  the 
suspicious  area). 

Hence,  a  total  of  144  computed  features  (4  features/sector  for  36  sectors)  can  be  used  as  input  values  for  the 
analysis  of  suspicious  areas.  The  relationship  between  the  computed  features  and  BI-RADS  descriptors  are 
discussed  below: 

(1)  Mass  Size  - 

The  36  "1"  values  would  provide  sufficient  data  for  the  neural  network  to  determine  the  size. 

(2)  Mass  Shape  (round,  oval,  lobulated,  or  irregular)  - 

The  36  "1"  and  36  "a"  values  could  approximate  the  shape  of  a  mass. 

(3)  Mass  Margin  (circumscribed,  microlobulated,  obscured,  ill-defined,  or  spiculate)  - 

The  36  "g"  and  36  "1"  values  should  be  able  to  describe  the  characteristics  of  the  mass  margin. 

(4)  Mass  Density  (fat-containing,  low  density,  isodense,  or  highly  dense)  - 

The  36  "c"  and  36  "g"  values  would  be  able  to  describe  the  density  of  the  mass. 
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In  short,  the  selected  features  are  greatly  matched  with  the  BI-RADS  descriptors.  The  reason  for  usino  36 
values  for  each  nominated  feature  is  four-fold;  (a)  mass  boundary  varies,  it  is  difficult  to  describe  an  ima«»e 
pattern  using  a  single  value;  (b)  due  to  the  general  shape  of  the  masses,  the  features  of  masses  can  be  easily 
analyzed  by  the  polar  coordinate  system;  (c)  in  case  some  features  are  inaccurately  computed  in  several 
directions  due  to  the  structure  noises,  such  as  the  breast  slender  lines,  there  may  still  exist  a  sufficient 
number  of  correct  features;  (d)  generally  more  accurate  results  can  be  produced  by  using  subdivided 
parameters  rather  than  using  global  parameters  in  a  pattern  recognition  task.  Other  computational  features 
(e.g.,  difference  entropy  and  other  higher  order  features)  are  eligible  but  require  further  investigation. 

3.3.  The  Neural  Network  Structures  Specifically  Designed  for  the  Extracted  Boundary  Features 

(A)  Multiple  path  with  circular  networking  to  instruct  the  neural  network  in  analyzing  sector  features 

We  designed  several  neural  network  connections  between  the  input  and  the  first  hidden  layers  as 
shown  in  Figure  3.  Figure  3(a),  (b),  and  (c)  illustrate  the  full  connection,  a  self  correlation  (SC)  networking, 
Md  a  neighborhood  correlation  (NC)  networking,  respectively.  Note  that  the  input  and  hidden  nodes  should 
be  completely  matched  when  combining  more  than  one  path  in  the  study.  In  this  case,  the  correlation  layers 
only  function  as  branch  connections  between  input  and  hidden  layers.  When  using  NC  paths,  networking 
engagement  within  multiple  sectors  (e.g.,  20°,  30°,  40°  and  50°  of  the  neighborhood  correlation)  can  be 
grouped.  The  method  of  using  the  multiple  correlation  connections  was  motivated  by  our  two-dimensional 
convolution  neural  network  (2-D  CNN)  research  experience  where  we  found  that  more  than  10  multiple 
convolution  kernels  were  necessary  to  archive  an  outstanding  neural  network  performance  in  the  detection  of 
lung  nodules  and  microcalcifications'^. 

Compared  to  2-D  CNN  systems,  the  required  computation  using  1-D  input  features  (i.e.,  144)  is 
relatively  small.  The  combination  of  the  networking  paths  described  earlier  for  MCPNN  was  implemented 
using  C  programming  language.  The  internal  computation  algorithm  used  in  the  MCPNN  shares  the  same 
convolution  process  as  that  in  the  2-D  CNN  .  One  additional  training  method  using  flipping  invariance  was 
employed  and  IS  descnbed  as  follows.  ^  e, 

(B)  Training  methods  and  the  utilization  of  characteristics  of  flipping  invariance  of  the  features 

Because  we  used  the  circular  paths,  there  were  no  starting  and  ending  sectors.  The  forward  and  back 
propagation  computation  can  be  started  from  any  sector.  Since  the  mass  characteristics  of  the  flipped  patch 
remained  the  same,  we  flipped  each  patch  in  the  training  set  and  kept  the  same  numerical  value  for  the  target 
output.  Since  we  designed  a  10  increment  for  each  rotation,  each  SC  or  NC  networking  would  need  to 
process  though  36  times  for  the  computed  feature  set  for  each  patch.  To  simplify  this  network  computation, 
we  smfted  (me  small  set  (4  nodes)  on  the  input  layer  a  time  to  conduct  the  circular  convolution  process  with 
the  SC  and  NC  kernels.  By  reversing  the  sequence  of  the  sector,  we  can  train  the  flipped  version  of  the 
suspicious  masses.  Hence,  the  characteristics  of  flipping  invariance  literally  increase  the  number  of  the 
training  set  by  a  factor  of  2.  The  flipping  procedure  was  also  used  for  the  BPNN  experiment  described  below. 

3.4.  Summary  of  Feature  Extraction  Methods  and  the  MCPNN 

We  have  described  our  approach  on  the  feature  extraction,  the  design  of  MCPNN,  and  its 
corresponding  training  method.  Figure  4  shows  a  flow  diagram  of  the  proposed  method.  Since  the  MCP  only 
alters  the  input  data  connection  from  the  input  to  the  first  hidden  layer,  any  learning  algorithm  can  be  applied 
within  the  neural  network.  For  simplicity,  we  used  the  back  propagation  algorithm  for  both  the  conventional 
and  proposed  neural  network  systems  in  the  following  experiments. 
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Figure  3.  Three  types  of  network  paths  connecting  the  input  and  the  hidden  layers: 

(a)  Full  connection. 

(b)  A  self  correlation  (SC)  path;  each  node  on  the  layer  connects  to  a  single  set  of  the  features  (l,a,g,c)  for  the 
fan-in  and  fully  connects  to  the  hidden  nodes  for  fan-out. 

(c)  A  neighborhood  correlation  (NC)  path;  each  node  on  the  layer  connects  to  five  adjacent  sets  of  the 
features  for  the  fan-in  and  fully  connects  to  the  hidden  nodes  for  fan-out. 

Note  that  the  fan-in  nets  emphasizing  self  correlation  in  (b)  and  neighborhood  correlation  in  (c) 
represent  convolution  weights  (i.e.,  the  same  type  of  sectors  possess  the  same  set  of  weighting  factors). 
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Figure  4.  A  flow  chart,  involving  the  MCPNN  and  sector  features  of  masses,  was  used  in  the  following  study. 


4.  Experiments  and  Results 

selected  91  mammograms  and  digitized  each  mammogram  with  a  computer  format  of  2048^ 
2500^12  bits  (for  an  1"  area  where  each  image  pixel  represents  100  pm  square).  No  two  mammograms 
same  patient  film  jacket.  All  the  digitized  mammograms  were  miniaturized  to 
512  625  12  bits  using  4M  pixel  averaging  and  were  processed  by  the  above  methods  to  perform  mass 
detecbon.  Based  on  the  corresponding  biopsy  reports,  one  experienced  radiologist  read  all  91  mammograms 
and  Identified  75  areas  containing  masses.  (Note  that  the  reports  recorded  the  malignancy  of  the  biopsy 
specimens  The  radiologist  only  used  them  as  reference  for  the  identification  of  masses.)  Through  the  pre- 
process  and  *e  first  step  screen  based  on  the  circularity  test,  a  total  of  125  suspicious  areas  were  extracted 
trom  the  91  digitized  mammograms. 

4.1.  Experiment  1 

,.  ,  We  rMdomly  selected  54  computer-segmented  areas  where  30  patches  were  matched  with  the 
radiologist  s  identification  and  24  were  not.  This  database  was  used  to  train  two  neural  network  systems’  (1) 
a  conventional  3-layer  BP  neural  network  (with  125  nodes  in  the  hidden  layer)  and  (2)  the  proposed  MCP 
h-aining  method  using  the  same  neural  network  learning  algorithm.  The  structure  of  the  MCPNN  was 
descnbed  e^lier  ^gver  we  used  one  fully  connected  oath,  four  SC  paths,  four  20°  NC.  oaths,  four  30° 
NC  paths,  three  40  NC  paths,  and  two  50  NC  paths  in  the  first  step  network  connection  for  the  MCPNN 
Both  neural  network  systems  were  trained  by  the  error  back  propagation  algorithm  by  feeding  the  features 
riom  the  input  layer  and  registering  the  corresponding  target  value  at  the  output  side.  Once  the  training  of 
me  neural  networks  was  complete,  we  then  used  the  remaining  7 1  computer  segmented  areas  for  the  testing 
None  of  the  images  and  their  corresponding  patients  in  the  testing  set  could  be  found  in  the  training  set.  The 
neural  ne^ork  ou^ut  values  were  fed  into  the  LABROC  program'®  for  the  performance  evaluation.  The 
results  indicated  that  the  areas  (Az)  under  the  receiving  operator  characteristic  (ROC)  curves  were  0  781  and 
0.844  using  the  conventional  BPNN  and  the  MCPNN,  respectively.  The  ROC  curves  of  these  two  neural 
network  trammg  inethods  are  shown  in  Figure  5(A).  We  also  invited  another  senior  mammographer  to 
conduct^  ROC  observer  study  The  mammographer  was  asked  to  rate  each  patch  using  a  numerical  scale 
ranging  0-10  for  its  likelihood  of  being  a  mass.  These  71  numbers  were  also  fed  into  the  LABROC  program 

Ihe  mai^ographer  s  performance  in  Az  on  this  set  of  test  cases  was  0.909.  The  corresponding  ROC  curve 
IS  also  shown  in  Figure  5(A). 

4.2.  Experiment  2 

We  also  conducted  a  leave-one-case-out  experiment  using  the  same  database.  In  this  experiment,  we 
used  those  patches  extracted  from  90  mammograms  for  the  training  and  used  the  patches  (most  of  them  ^e 
Single)  extracted  from  the  remaining  one  mammogram  as  test  objects.  The  procedure  was  repeated  91  times 
to  allow  every  suspicious  patch  from  each  mammogram  to  be  tested  in  the  experiment.  For  each  individual 
suspicious  area,  the  computed  features  were  identical  to  those  used  in  Experiment  1.  Again,  both  neural 
network  systems  were  independently  evaluated  with  the  same  procedure.  The  results  indicated  that  the  Az 
values  were  0.799  and  0.887  using  the  conventional  back  propagation  neural  network  and  the  MCPNN 
respectively.  Figure  5(B)  shows  the  ROC  curves  of  these  two  neural  network  systems  using  the  leave-one- 
ot-out  procedure  in  the  experiment. 
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ROC  Curves  of  The  Mammographer  and 
Two  Different  Neural  Network  Training 
Methods  in  Experiment  1. 


ROC  Curves  of  The  Two  Different 
Training  Methods  in  Experiment  2. 


Figure  5.  The  ROC  curves  obtained  from  corresponding  experiments. 

(A)  The  left  figure  shows  that  the  performance  of  MCPNN  training  method  is  superior  to  that  of  the 
conventional  input  method.  The  highest  curve  is  the  ROC  performance  of  the  senior  mammographer. 

(B)  The  right  figure  shows  similar  results  with  a  higher  performance  using  the  leave-one-case-out 
procedure  as  described  in  Experiment  2. 


5.  Conclusions  and  Discussion 

Through  this  study,  we  found  that  the  selected  features  are  somewhat  effective  in  the  detection  of 
masses.  These  features  were  “computationally  translated”  from  the  qualitative  descriptors  of  BI¬ 
RAD.  Another  uniqueness  of  this  study  was  on  the  test  of  our  newly  developed  MCP  training  method. 
In  Experiment  1,  we  found  that  the  performances  of  both  neural  network  systems  were  increased.  This  might 
be  due  to  the  increased  number  of  cases  (from  54  to  124)  in  the  training  set.  In  Experiment  2,  the  Az  value 
was  improved  by  0.043  using  the  MCPNN  training  method  that  was  higher  than  Az  difference  of  0.018 
obtained  by  the  conventional  training  method.  The  results  implied  that  the  MCPNN  learned  more  effectively 
than  the  conventional  BP  when  the  number  of  training  cases  was  increased. 

It  is  known  in  the  field  of  artificial  intelligence  that  the  key  factors  in  pattern  recognition  are;  (1) 
effective  methods  in  the  extraction  of  features  and  (2)  analytic  methods  (e.g.,  back  propagation  neural 
network)  for  the  extracted  features.  In  this  study,  we  showed  that  the  training  method  designed  to  guide  the 
analyzer  is  also  an  important  factor  to  a  success  of  a  pattern  recognition  task.  Though  this  finding  is  not 
new,  the  trend  of  developing  training  methods  for  various  pattern  recognition  tasks  was  not  established  in  the 
field  of  pattern  recognition.  In  this  work,  we  demonstrated  that  organized  features  with  proper  network 
connection  and  task-oriented  guidance  would  assist  the  neural  network  in  performing  the  task. 

As  far  as  the  research  in  recognition  of  masses  is  concerned,  we  believe  that  main  concept  of  using 
sectors  is  an  effective  approach.  Note  that  any  features  arranged  in  the  polar  coordinate  system  can  be 
trained  by  the  MCP  method.  Since  the  MCP  only  coordinates  the  input  data,  the  internal  neural  network 
learning  algorithm  can  be  changed  to  other  learning  algorithms.  A  technique  using  the  rubber  band 
straightening  transformation,  independently  developed  by  Sahnier^,  for  the  detection  of  masses  also  employs 
a  similar  concept  in  extracting  feature  and/or  texture  in  the  polar  coordinate.  We  believe  that  integration  of 
effective  feature  and  texture  values  computed  at  small  sectors  will  be  the  research  trend  in  mass  detection. 
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u?  with  glandular  tissues,  a  significant  part  of  the  mass  may  be 

obscured  and  is  unrecoverable  by  digital  image  processing  techniques.  By  reviewing  those  failure  cases  we 

iHpnrif  were  in  this  category.  However,  these  cases  were  correctly 

endfied  by  the  radiologists.  This  implies  that  we  need  to  find  a  way  to  train  the  neural  network  to 
recognize  those  cases  with  sufficient  sectors  showing  signs  of  masses.  Further  research  based  on  this  pilot 
study  IS  planned  and  the  results  will  be  reported  shortly  by  the  authors.  ^ 
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ABSTRACT 

Since  an  image  data  compression  technique  is  usually  associated  with  a  low-pass  filter,  the  unsharpness  of 
calcifications  and  edges  are  of  clinical  concerns  in  mammography.  The  same  effect  may  turn  film  defects  into  calcification¬ 
like  spots  and  could  produce  false-positive  detection  by  the  radiologist.  In  this  study,  we  employed  a  highly  sensitive 
calcification  detection  system  to  guide  an  S+P  integer  wavelet  compression,  so  that  the  data  fidelity  of  calcifications  or 
unknown  spots  are  fully  preserved.  The  prediction  component  of  the  S+P  decomposition  is  based  on  Daubechies’D8. 

Our  results  indicated  that  the  modified  CAD  program  detected  an  average  of  1,193  potential  calcifications  on  CC 
view  mammograms  and  an  average  of  948  potential  calcifications  on  MLO  view  mammograms,  respectively.  Compressed 
data  rates  between  0.1  to  0.43  bit/pixel  were  studied.  The  compressed  images  were  evaluated  by  subjective  comparison 
studies.  The  results  indicated  that  no  difference  could  be  observed  between  the  original  and  the  0.43  bit  rate  decompressed 
images.  The  radiologist  identifies  20%  of  the  compressed  images  at  0.1  bit  rate  suffering  from  minor  blurry  artifacts  and  6% 
of  the  compressed  images  possessing  greater  edge  sharpness.  Without  a  lossless  compression  for  microcalcifications,  the 
radiologist  identified  20%  of  the  microcalcifications  on  the  compressed  mammograms  at  0.1  bit  rate  suffering  from  minor 
compression  artifacts. 

Keywords:  Compression,  computer-aided  diagnosis,  wavelet,  mammography,  microcalcifications,  just  noticeable  difference 


1.  INTRODUCTION 

The  recent  advancements  in  high-speed  digital  computers,  networking,  as  well  as  the  gradual  acceptance  of  high- 
resolution  digital  radiographic  systems  have  revived  the  interest  in  the  development  of  digital  radiography  including 
mammography  for  routine  clinical  use.  Currently,  it  is  possible  to  obtain  a  digital  mammogram  having  high  spatial  resolution 
by  digitizing  screen-film  images  with  a  laser  digitizer^’  ^  or  directly  digital  systems'^’ 

The  research  and  development  of  teleradiology  and  telemammography  systems  has  progressed  through  many 
technical  and  clinical  endeavors  The  clinical  utilization  of  teleradiology  systems  is  not  known  with  regard  to  workloads, 
reliability,  and  clinical  protocols.  The  selection  of  efficient  and  cost-effective  wide-area  networks  for  various  applications  is 
presently  more  an  art  than  a  science.  In  this  area,  two  technical  problems  remain:  (a)  no  model  exists  by  which  radiologists 
can  apply  the  experience  of  others  to  design  and  implement  a  teleradiology  system;  (b)  teleradiology  systems  have  not  been 
studied  for  use  in  research  and  education. 

Since  a  large  computer  space  (10  to  40  Mbytes)  is  required  to  store  a  mammogram,  it  takes  a  long  time  for  an 
economical  channel  to  transmit  the  image.  It  would  take  ~5  hours  to  transmit  an  uncompressed  mammogram  and  about  1 
hour  to  transmit  the  breast  area  with  losslessly  compressed  data.  If  an  Ethernet  is  used,  the  transmission  can  be  increased  by  a 
factor  of  15  which  is  still  too  slow  for  clinical  use.  In  this  study,  we  used  a  combined  technique  that  integrates  an  integer 
wavelet  compression  technique  and  a  highly  sensitive  computer-aided  detection  (CAD)  process  to  compress  digital 
mammography.  The  decompressed  mammograms  would  possess  error-free  at  all  small  bright  spot  including  calcifications. 
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2.  CAD-GUIDED  COMPRESSION  SCHEME  FOR  DIGITAL  MAMMOGRAPHY 

We  randomly  selected  100  mammograms  from  our  clinical  database  for  this  study;  of  which,  50  were  CC  view  and 
another  50  were  view  mammograms.  Each  of  these  mammograms  contains  isolated  and/or  clustered 

microcalcifications.  Each  mammogram  was  digitized  by  a  Lumisys  laser  scanner  (LumiScan  Model  150)  at  100  microns  per 
pixel  so  Aat  each  digitized  mammogram  takes  1,792x2,560x16  bits  of  a  computer  space.  However,  only  12  out  of  16  bits 
were  used  to  store  the  digital  data  for  each  pixel. 

A  preprocessing  step,  the  boundary  of  the  breast  on  the  mammogram  was  first  delineated.  The  compression 
method  and  the  CAD  detection  program  only  applied  to  the  area  within  the  breast  boundary.  The  CAD  system  developed  by 
^  Center  of  Georgetown  University  Medical  Center  was  used  in  conjunction  with  the  compression 
scheme.  The  C^  system  was  modified  from  an  existing  CAD  program  to  identify  calcifications  and  local  maximum  on  the 
mai^gram.  The  existog  CAD  system  consists  of  six  major  components’- (1)  delineation  of  breast  area  on  mammogram, 
(2)  high-pass  enh^ced  filter  for  reduction  of  breast  parenchyma  and  enhancement  of  calcifications,  (3)  extraction  of  the  local 
maximum  intensity  as  suspected  calcification,  (4)  computing  features  (e.g.,  size,  shape,  contrast,  etc.)  on  the  original 
mammogr^  (5)  applying  convolution  neural  network  for  the  recognition  of  the  calcifications  using  Gaussian  as  the 
activation  function.^d  (6)  clustering  the  suspected  spots.  The  first  three  steps  of  this  detection  scheme  were  adapted  to  form 
a  highly  sensitive  CAD  system.  For  each  detection,  a  region  of  interest  (ROI)  containing  10x10  pixels  centered  at  the 
detected  Sj^t  was  the  subject  for  lossless  compression  separated  from  the  overall  compression  of  the  mammogram 

We  used  an  integer  wavelet  transform  (S+P  algorithm)"  to  decompose  the  whole  mammogram  followed  by  a  linear 
quanuzation  process  and  arithmetic  coding  to  encode  the  quantized  wavelet  coefficients.  The  prediction  component  of  the 
mteger  wavelet  is  an  approximated  version  of  Daubechies’  D8".  The  suspected  calcification  spots  identified  by  the  CAD 
system  were  compressed  by  the  same  wavelet  without  the  quantization.  Figure  1  illustrates  this  compression  scheme. 


CotTprcssed  fUc 


Figure  1:  A  CAD  guided  compression  scheme  based  on  integer  wavelet  decomposition. 


3.  DESCRIPTION  OF  OBSERVER  PERFORMANCE  STUDIES 

f  the  study  radiologist  to  read  a  set  of  images  at  a  time.  Each  set  of  images  is  a  pair  of  the  original  and  one 

The  three  compression  modes  are:  (i)  0.3  bit/pixel  in  file  A  with  lossless  for  suspected 
calcifications  m  file  B,  (ii)  0.1  bit/pixel  in  file  A  with  lossless  for  suspected  calcifications  in  file  B,  and  (iii)  0.1  bit/pixel  in  file 
A  only.  A  questionnaire  consisting  of  four  sections  of  quality  measures  was  used  for  each  comparison  of  a  pair  of  images. 

of  decompressed  and  original  images  were  displayed  on  a  Compaq  computer  monitor.  The  orders  on  right 
and  left  as  a  pair  of  iimges  were  randomly  assigned.  Three  basic  image  functions  (i.e.,  window/level,  pan,  and  zoom)  were 
provided  for  the  radiologist  to  adjust  viewing  parameters.  The  radiologist  was  asked  to  rate  image  quality  in  four  sections:  (1) 
ca  cification  observability.  (2)  edge  sharpness,  (3)  overall  image  quality,  and  (4)  noise  appearance.  A  four-section 
questionnaire  for  each  pair  of  images  was  used  as  shown  in  Figure  2. 
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Letters  “L”  and  “R”  indicate  that  the  left  and  right  sides  rank  higher  for  the  measured  quality,  respectively.  A  non¬ 
zero  score  indicates  that  one  side  of  the  image  has  either  slightly  (for  LI  or  Rl),  or  moderately  (for  L2  or  R2),  or  significantly 
(for  L3  or  R3)  better  quality  or  less  noise  than  the  other  side.  A  score  of  “0”  indicates  that  the  pair  of  images  has  identical 
image  quality  or  noise  appearance.  If  there  is  a  noticeable  difference  between  images  that  are  scored  “0”  on  the  measured 
quality,  the  radiologist  would  check  the  “yes“  box  below  the  “0”  score.  If  a  noticeable  difference  is  identified,  the  radiologist 
would  check  one  of  the  lowest  boxes  that  indicates  the  favored  side  of  the  image. 


Figure  2:  A  questionnaire  for  qualitative  measures  for  a  pair  of  images. 


4.  RESULTS 

In  this  study,  the  modified  CAD  program  detected  an  average  of  1,193  bright  spots  in  CC  view  mammograms  and  an 
average  of  948  bright  spots  in  MLO  view  mammograms,  respectively.  Figures  3  and  4  show  two  sample  mammograms,  their 
compressed  counterparts,  and  the  subtracted  images  amplified  by  100  times. 

The  average  compression  ratios  and  computed  mean-square-errors  are  shown  in  Table  1.  Table  11  illustrates  the 
results  of  the  radiologist’s  qualitative  measures  by  comparing  the  original  and  compressed  image  pairs. 
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Figure  3:  (A)  A  CC  view  mammogram,  (B)  its  compressed  image  at  0.4  bit/pixel,  and  (C)  an  enhanced  subtraction  image 
resulting  from  (A)-(B).  The  uniform  squares  in  (C)  result  form  the  lossless  compression  at  the  CAD  detected  areas. 


Figure  4:  (A)  A  MLO  view  mammogram,  (B)  its  compressed  image  at  0.41  bit/pixel,  and  (C)  an  enhanced  subtraction  image 
resulting  from  (A)-(B).  The  uniform  squares  in  (C)  result  form  the  lossless  compression  at  the  CAD  detected  areas. 


Table  n  shows  that  no  difference  could  be  observed  between  the  original  and  the  0.43  bit  rate  decompressed  images. 
In  fact,  it  is  interesting  that  the  radiologist  seemed  slightly  in  favor  of  the  appearances  of  microcalcifications  and  edges  in  the 
compressed  mammograms.  The  radiologist  identified  20%  of  the  compressed  images  at  0.1  bit  rate  suffering  from  minor 
blurry  artifacts  and  6%  of  the  compressed  images  possessing  greater  edge  sharpness.  Without  using  lossless  compression  for 
microcalcifications,  the  radiologist  could  identify  20%  of  the  less  sharp  microcalcifications  on  the  compressed  mammograms 
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at  0.1  bit  rate.  The  radiologist  also  identified  that  18%  and  6%  of  the  compressed  images  at  0.1  bit  rate  possess  degraded 
overall  image  quality  and  higher  image  noise,  respectively.  Degradation  of  image  quality  in  compressed  images  at  0.1  bit  rate 
is  highly  associated  with  unsharpness  of  microcalcifications  and  edges.  The  image  quality  degradation  at  0.1  bit  rate  is  also 
correlated  the  size  of  breast  area.  It  is  estimated  that  if  the  size  of  breast  takes  more  than  one  half  of  the  whole  mammogram, 
degraded  image  quality  and  edge  unsharpness  would  be  observable  by  the  radiologist. 


Table  L  Compression  Ratios  and  Mean-Square-Errors  of  the  Three  Compression  Modes. 


Mode^ 

'  A  ^ 

.  if  ■'f-c 

Procedure 

'  }  * 

0.3  bit/pixel 
+  lossless  for  spots 

0.1  bit/pixel 
+  lossless  for  spots 

0.1  bit/pixel 

Average  Bit  Rate  *. 

0.43  bit/pixel 

0.23  bit/pixel 

0.1  bit/pixel 

Compression  Ratio 

27:1 

52:1 

120:1 

Mean  Square  Error 

50.73 

102.72 

105.63 

(Standard  Deyation) 

(36.81) 

(62.48) 

(63.97) 

Table  11 .  Qualitative  Measures  by  Comparing  the  Paired  Images  (Original  and  Compressed). 
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7  ^  / 

gdveralt 

Overall  Noise  Pattern 

Type  A 

■  Typ¥'&' 
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IVpeC 

Original  Worse  Than  Compressed 

■ 

■ 

■ 

6 

2 

3 

1 

0 

2 

1 

0 

0 

of  which: 
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3 

■ 

■ 

5 

■ 

■ 

0 

0 

•  slightly  worse 

4 

0 

1 

0 

3 

1 

0 

0 

0 

0 

0 

-  moderately  worse 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

_ 0 

Original  Better  Than  Compressed 

■ 

■ 

0 

■ 

of  which: 

-  same,  but  in  favor  of  original 

6 

■ 

■ 

■ 

6 

9 

1 

3 

2 

0 

2 

0 

-slightly  better 

1 

4 

5 

0 

5 

5 

0 

4 

4 

0 

0 

3 

-  moderately  better 

0 

0 

1 

0 

0 

1 

0 

0 

1 

0 

0 

0 

No  Dfference 

36 

16 

14 

40 

12 

7 

48 

18 

16 

49 

23 

22 

Type  A  •  Compression  with  preservation  of  suspicious  calcifications;  Compression  rate:  0.43  bit/pixei  (0.3+0.13);  Total  50  Cases 
Type  B  *  Compression  with  preservation  of  suspicious  calcifications;  Compression  rate:  0.23  bit/pixel  (0.1+0.13);  Total  25  Cases 
Type  C  -  Global  compression;  Compression  rate:  0.1  bit/pixel;  Total  25  Cases 


5.  CONCLUSIONS  AND  DISCUSSION 

In  this  study,  we  show  that  an  advanced  image  compression  method  can  be  integrated  with  a  computer  detection 
technique.  Since  the  computer  detection  technique  can  identify  potential  clinical  ROIs,  it  is  appropriate  for  the  compression 
program  treat  these  ROIs  with  a  different  compression  strategy.  This  method  is  not  designed  to  optimize  the  compression.  It 
rather  is  a  realistic  approach  for  the  clinical  usage  of  the  compression  technique  in  digitized  or  digital  mammography.  Using 
this  integrated  approach,  we  found  that  mammograms  compressed  at  0.43  bit/pixel  (i.e.,  37:1)  contain  the  same  visual  quality 
as  the  original  mammograms.  This  was  confirmed  by  the  above  study  using  just  noticeable  difference  perception  study. 
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